Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

[ CI ] Upstream sync to v0.4.3 branch #377

Closed
wants to merge 77 commits into from

Conversation

robertgshaw2-neuralmagic
Copy link
Collaborator

SUMMARY:

  • upstream sync to v0.4.3 of vllm
  • git cherry-pick f68470e803df575f294e67167b4b83adfe004cfa..1197e02141df1a7442f21ff6922c98ec0bba153e
  • vllm-project@f68470e
  • vllm-project@1197e02 (corresponds to upstream v0.4.3

Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <morz@ai21.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
youkaichao and others added 29 commits July 14, 2024 21:40
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
…#5112)

Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Co-authored-by: omkarkakarparthi <okakarpa>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
…e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
Co-authored-by: xuhao <xuhao@cambricon.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.