habana_main rebase#81
Merged
kzawora-intel merged 537 commits intohabana_mainfrom private/kzawora/rebase_v3Jul 2, 2024
+64,037-19,646
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Jun 13, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 14, 2024
- authored
- authored
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with
perf-benchmarks
label (vllm-project#5073)- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 15, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 18, 2024
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (vllm-project#5131)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (vllm-project#5542)
authored- authored
- authored
[Bugfix] Fix for inconsistent behaviour related to sampling and repetition penalties (vllm-project#5639)
authored- authored
- authored
Commits on Jun 19, 2024
- authored
- authored
- authored
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (vllm-project#5641)
authored- authored
- authored
- authored
- authored
- authored
- authored
[Frontend][Bugfix] Fix preemption_mode -> preemption-mode for CLI arg in arg_utils.py (vllm-project#5688)
authored- authored
- authored
- authored
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (vllm-project#5650)
authored- authored
Commits on Jun 20, 2024
- authored
- authored
- authored
- authored
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (vllm-project#5718)
authored
Commits on Jun 21, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 22, 2024
- authored
[Misc] Remove vllm-project#4789 workaround left in vllm/entrypoints/openai/run_batch.py (vllm-project#5756)
authored- authored
- authored
- authored
Commits on Jun 23, 2024
- authored
Commits on Jun 24, 2024
- authored
- authored
- committed
- committed
- committed
- authored
- authored
- authored
Commits on Jun 25, 2024
- authored
- authored
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (vllm-project#5414)
authored- committed
- committed
- committed
- committed
- committed
- committed
- authored
- committed
- authored
- committed
- committed
- authored
- authored
- authored
- authored
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (vllm-project#5422)
authored- authored
Commits on Jun 26, 2024
- authored
- authored
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (vllm-project#5408)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 27, 2024
- authored
- authored
- authored
- authored
- authored
- committed
- authored
- committed
- committed
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 28, 2024
- authored
[VLM][BugFix] Make sure that
multi_modal_kwargs
can broadcast properly with ring buffer. (vllm-project#5905)- authored
- authored
- authored
[Bugfix] Better error message for MLPSpeculator when
num_speculative_tokens
is set too high (vllm-project#5894)authored- authored
- authored
- authored
- authored
[ Misc ] Remove
fp8_shard_indexer
from Col/Row Parallel Linear (Simplify Weight Loading) (vllm-project#5928)- authored
- authored
Commits on Jun 29, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 30, 2024
Commits on Jul 1, 2024
- authored
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (vllm-project#5348)
authored- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 2, 2024
- authored
- authored
- committed
- authored
- committed
- committed