forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MFM] Merge changes from vllm-project/vllm main (#28)
* [Misc] Use VisionArena Dataset for VLM Benchmarking (vllm-project#12389) Signed-off-by: Roger Wang <ywang@roblox.com> * [ci/build] fix wheel size check (vllm-project#12396) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Hardware][Gaudi][Doc] Add missing step in setup instructions (vllm-project#12382) * [ci/build] sync default value for wheel size (vllm-project#12398) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] Enable proxy support in benchmark script (vllm-project#12356) Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> * [Bugfix][Kernel] Fix CUDA 11.8 being broken by FA3 build (vllm-project#12375) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> * [Misc] Remove deprecated code (vllm-project#12383) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (vllm-project#12405) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> * [Bugfix][Kernel] Fix moe align block issue for mixtral (vllm-project#12413) * [Bugfix] Fix BLIP-2 processing (vllm-project#12412) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (vllm-project#12408) Signed-off-by: Divakar Verma <divakar.verma@amd.com> * [Misc] Add FA2 support to ViT MHA layer (vllm-project#12355) Signed-off-by: Isotr0py <2037008807@qq.com> * [TPU][CI] Update torchxla version in requirement-tpu.txt (vllm-project#12422) Signed-off-by: Siyuan Liu <lsiyuan@google.com> * [Misc][Bugfix] FA3 support to ViT MHA layer (vllm-project#12435) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [V1][Perf] Reduce scheduling overhead in model runner after cuda sync (vllm-project#12094) Signed-off-by: Keyun Tong <tongkeyun@gmail.com> * [V1][Bugfix] Fix assertion when mm hashing is turned off (vllm-project#12439) Signed-off-by: Roger Wang <ywang@roblox.com> * [Misc] Revert FA on ViT vllm-project#12355 and vllm-project#12435 (vllm-project#12445) * [Frontend] generation_config.json for maximum tokens(vllm-project#12242) Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> * [Bugfix] Disable w16a16 2of4 sparse CompressedTensors24 (vllm-project#12417) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> * [Bugfix/CI] Fix broken kernels/test_mha.py (vllm-project#12450) * [Bugfix][Kernel] Fix perf regression caused by PR vllm-project#12405 (vllm-project#12434) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> * [Build/CI] Fix libcuda.so linkage (vllm-project#12424) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Frontend] Rerank API (Jina- and Cohere-compatible API) (vllm-project#12376) Signed-off-by: Kyle Mistele <kyle@mistele.com> * [DOC] Add link to vLLM blog (vllm-project#12460) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * [V1] Avoid list creation in input preparation (vllm-project#12457) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Frontend] Support scores endpoint in run_batch (vllm-project#12430) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> * [Bugfix] Fix Granite 3.0 MoE model loading (vllm-project#12446) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix missing seq_start_loc in xformers prefill metadata (vllm-project#12464) Signed-off-by: Isotr0py <2037008807@qq.com> * [V1][Minor] Minor optimizations for update_from_output (vllm-project#12454) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Bugfix] Fix gpt2 GGUF inference (vllm-project#12467) Signed-off-by: Isotr0py <2037008807@qq.com> --------- Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Kyle Mistele <kyle@mistele.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Matthew Hendrey <matthew.hendrey@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Kyle Mistele <kyle@mistele.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Pooya Davoodi <pooya.davoodi@parasail.io>
- Loading branch information
1 parent
a1a36f3
commit e8e548c
Showing
68 changed files
with
3,639 additions
and
526 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# vLLM Blog | ||
|
||
vLLM blog posts are published [here](https://blog.vllm.ai/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
""" | ||
Example of using the OpenAI entrypoint's rerank API which is compatible with | ||
the Cohere SDK: https://github.com/cohere-ai/cohere-python | ||
run: vllm serve BAAI/bge-reranker-base | ||
""" | ||
import cohere | ||
|
||
# cohere v1 client | ||
co = cohere.Client(base_url="http://localhost:8000", api_key="sk-fake-key") | ||
rerank_v1_result = co.rerank( | ||
model="BAAI/bge-reranker-base", | ||
query="What is the capital of France?", | ||
documents=[ | ||
"The capital of France is Paris", "Reranking is fun!", | ||
"vLLM is an open-source framework for fast AI serving" | ||
]) | ||
|
||
print(rerank_v1_result) | ||
|
||
# or the v2 | ||
co2 = cohere.ClientV2("sk-fake-key", base_url="http://localhost:8000") | ||
|
||
v2_rerank_result = co2.rerank( | ||
model="BAAI/bge-reranker-base", | ||
query="What is the capital of France?", | ||
documents=[ | ||
"The capital of France is Paris", "Reranking is fun!", | ||
"vLLM is an open-source framework for fast AI serving" | ||
]) | ||
|
||
print(v2_rerank_result) |
Oops, something went wrong.