Release v0.7.3 · vllm-project/vllm

Highlights

🎉 253 commits from 93 contributors, including 29 new contributors!

Deepseek enhancements:
- Support for DeepSeek Multi-Token Prediction, 1.69x speedup in low QPS scenarios (#12755)
- AMD support: DeepSeek tunings, yielding 17% latency reduction (#13199)
- Using FlashAttention3 for MLA (#12807)
- Align the expert selection code path with official implementation (#13474)
- Optimize moe_align_block_size for deepseek_v3 (#12850)
V1 Engine:
- LoRA Support (#10957, #12883)
- Logprobs and prompt logprobs support (#9880), min_p sampling support (#13191), logit_bias in v1 Sampler (#13079)
- Use msgpack for core request serialization (#12918)
- Pipeline parallelism support (#12996, #13353, #13472, #13417, #13315)
- Metrics enhancements: GPU prefix cache hit rate % gauge (#12592), iteration_tokens_total histogram (#13288), several request timing histograms (#12644)
- Initial speculative decoding support with ngrams (#12193, #13365)

Model Support

Enhancement to Qwen2.5-VL: BNB support (#12944), LoRA (#13261), Optimizations (#13155)
Support Unsloth Dynamic 4bit BnB quantization (#12974)
IBM/NASA Prithvi Geospatial model (#12830)
Support Mamba2 (Codestral Mamba) (#9292), Bamba Model (#10909)
Ultravox Model: Support v0.5 Release (#12912)
transformers backend
- Enable quantization support for transformers backend (#12960)
- Set torch_dtype in TransformersModel (#13088)
VLM:
- Implement merged multimodal processor for Mllama (#11427), GLM4V (#12449), Molmo (#12966)
- Separate text-only and vision variants of the same model architecture (#13157)

Hardware Support

Pluggable platform-specific scheduler (#13161)
NVIDIA: Support nvfp4 quantization (#12784)
AMD:
- Per-Token-Activation Per-Channel-Weight FP8 (#12501)
- Tuning for Mixtral on MI325 and Qwen MoE on MI300 (#13503), Mixtral8x7B on MI300 (#13577)
- Add intial ROCm support to V1 (#12790)
TPU: V1 Support (#13049)
Neuron: Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
Gaudi:
- Support Contiguous Cache Fetch (#12139)
- Enable long-contexts + LoRA support (#12812)

Engine Feature

Add sleep and wake up endpoint and v1 support (#12987)
Add /v1/audio/transcriptions OpenAI API endpoint (#12909)

Performance

Reduce TTFT with concurrent partial prefills (#10235)
LoRA - Refactor sgmv kernels (#13110)

Others

Make vLLM compatible with veRL (#12824)
Fixes for cases of FA2 illegal memory access error (#12848)
choice-based structured output with xgrammar (#12632)
Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068)

What's Changed

[Misc] Update w2 scale loading for GPTQMarlinMoE by @dsikka in #12757
[Docs] Add Google Cloud Slides by @simon-mo in #12814
[Attention] Use FA3 for MLA on Hopper by @LucasWilkinson in #12807
[misc] Reduce number of config file requests to HuggingFace by @khluu in #12797
[Misc] Remove unnecessary decode call by @DarkLight1337 in #12833
[Kernel] Make rotary_embedding ops more flexible with input shape by @Isotr0py in #12777
[torch.compile] PyTorch 2.6 and nightly compatibility by @youkaichao in #12393
[Doc] double quote cmake package in build.inc.md by @jitseklomp in #12840
[Bugfix] Fix unsupported FA version check for Turing GPU by @Isotr0py in #12828
[V1] LoRA Support by @varun-sundar-rabindranath in #10957
Add Bamba Model by @fabianlim in #10909
[MISC] Check space in the file names in the pre commit checks by @houseroad in #12804
[misc] Revert # 12833 by @khluu in #12857
[Bugfix] FA2 illegal memory access by @LucasWilkinson in #12848
Make vllm compatible with verl by @ZSL98 in #12824
[Bugfix] Missing quant_config in deepseek embedding layer by @SzymonOzog in #12836
Prevent unecessary requests to huggingface hub by @maxdebayser in #12837
[MISC][EASY] Break check file names into entry and args in the pre-commit hooks by @houseroad in #12880
[Misc] Remove unnecessary detokenization in multimodal processing by @DarkLight1337 in #12868
[Model] Add support for partial rotary embeddings in Phi3 model by @garg-amit in #12718
[V1] Logprobs and prompt logprobs support by @afeldman-nm in #9880
[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing by @tjtanaa in #12501
[V1] LM Eval With Streaming Integration Tests by @robertgshaw2-redhat in #11590
[Bugfix] Fix disagg hang caused by the prefill and decode communication issues by @houseroad in #12723
[V1][Minor] Remove outdated comment by @WoosukKwon in #12928
[V1] Move KV block hashes from Request to KVCacheManager by @WoosukKwon in #12922
[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping by @jeejeelee in #12905
[Misc] Fix typo in the example file by @DK-DARKmatter in #12896
[Bugfix] Fix multi-round chat error when mistral tokenizer is used by @zifeitong in #12859
[bugfix] respect distributed_executor_backend in world_size=1 by @youkaichao in #12934
[Misc] Add offline test for disaggregated prefill by @Shaoting-Feng in #12418
[V1][Minor] Move cascade attn logic outside _prepare_inputs by @WoosukKwon in #12943
[Build] Make pypi install work on CPU platform by @wangxiyuan in #12874
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi by @SanjuCSudhakaran in #12812
[misc] Add LoRA to benchmark_serving by @varun-sundar-rabindranath in #12898
[Misc] Log time consumption on weight downloading by @waltforme in #12926
[CI] Resolve transformers-neuronx version conflict by @liangfu in #12925
[Doc] Correct HF repository for TeleChat2 models by @waltforme in #12949
[Misc] Add qwen2.5-vl BNB support by @Isotr0py in #12944
[CI/Build] Auto-fix Markdown files by @DarkLight1337 in #12941
[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU by @ShangmingCai in #12935
[bugfix] fix early import of flash attention by @youkaichao in #12959
[VLM] Merged multi-modal processor for GLM4V by @jeejeelee in #12449
[V1][Minor] Remove outdated comment by @WoosukKwon in #12968
[RFC] [Mistral] FP8 format by @patrickvonplaten in #10130
[V1] Cache uses_mrope in GPUModelRunner by @WoosukKwon in #12969
[core] port pynvml into vllm codebase by @youkaichao in #12963
[MISC] Always import version library first in the vllm package by @houseroad in #12979
[core] improve error handling when wake up from sleep mode by @youkaichao in #12981
[core][rlhf] add colocate example for RLHF by @youkaichao in #12984
[V1] Use msgpack for core request serialization by @njhill in #12918
[Bugfix][Platform] Check whether selected backend is None in get_attn_backend_cls() by @terrytangyuan in #12975
[core] fix sleep mode and pytorch checkpoint compatibility by @youkaichao in #13001
[Doc] Add link to tool_choice tracking issue in tool_calling.md by @terrytangyuan in #13003
[misc] Add retries with exponential backoff for HF file existence check by @khluu in #13008
[Bugfix] Clean up and fix multi-modal processors by @DarkLight1337 in #13012
Fix seed parameter behavior in vLLM by @SmartManoj in #13007
[Model] Ultravox Model: Support v0.5 Release by @farzadab in #12912
[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU by @khluu in #13022
[V1][Minor] Move scheduler outputs to a separate file by @WoosukKwon in #13062
[Docs] Annouce Meta Meetup by @simon-mo in #13065
[Bugfix] Support missing tool parameters in mistral tokenizer by @fgreinacher in #12884
[Benchmark] Add BurstGPT to benchmark_serving by @WoosukKwon in #13063
[Core] Don't do platform detection at import time by @russellb in #12933
[Misc] LoRA - Refactor Punica ops tests by @varun-sundar-rabindranath in #12970
[Bugfix]: Reasoning output bug according to the chat template change by @gaocegege in #13025
[V1][Metrics] Add GPU prefix cache hit rate % gauge by @comaniac in #12592
[executor] init local_rank as device index by @MengqingCao in #13027
[ROCm] Using a more precise memory profiling by @gshtras in #12624
[Build] Fix cuda link target of cumem_allocator in CPU env by @guoyuhong in #12863
[Platform] add pre_register_and_update function by @wangxiyuan in #12432
[Bugfix] fix flaky test by @SmartManoj in #13089
[V1][Metrics] Add several request timing histograms by @markmc in #12644
Set torch_dtype in TransformersModel by @hmellor in #13088
[Misc] Fix typo at comments at metrics.py by @je1lee in #13024
[Bugfix] Do not use resource module on Windows (#12858) by @MoonRide303 in #13029
[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES by @HollowMan6 in #12962
Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 by @SzymonOzog in #13023
[CI/Build][Bugfix] Fix CPU backend default threads num by @bigPYJ1151 in #13077
[Doc] Improve OpenVINO installation doc by @hmellor in #13102
[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar by @terrytangyuan in #12976
[Misc] Move pre-commit suggestion back to the end by @russellb in #13114
[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM by @youngkent in #12518
[Model] IBM/NASA Prithvi Geospatial model by @christian-pinto in #12830
[ci] Add more source file dependencies for some tests by @khluu in #13123
[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency by @lingfanyu in #12921
Bump helm/kind-action from 1.10.0 to 1.12.0 by @dependabot in #11612
Bump actions/stale from 9.0.0 to 9.1.0 by @dependabot in #12462
Bump helm/chart-testing-action from 2.6.1 to 2.7.0 by @dependabot in #12463
Bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #12672
Further reduce the HTTP calls to huggingface.co by @maxdebayser in #13107
[Misc] AMD Build Improvements by @842974287 in #12923
[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request by @bnellnm in #13108
[Bugfix] Fix num video tokens calculation for Qwen2-VL by @DarkLight1337 in #13148
[Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral by @rafvasq in #12332
[Misc] Delete unused LoRA modules by @jeejeelee in #13151
Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path by @houseroad in #12998
[CI/Build] Use mypy matcher for pre-commit CI job by @russellb in #13162
[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control by @Qubitium in #7086
[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity by @mgoin in #13119
[CI] Fix failing FP8 cpu offload test by @mgoin in #13170
[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort by @andoorve in #13173
[CI/Build] Ignore ruff warning up007 by @russellb in #13182
[perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance by @khluu in #12706
[NVIDIA] Support nvfp4 quantization by @kaixih in #12784
[Bugfix][Example] Fix GCed profiling server for TPU by @mgoin in #12792
[VLM] Implement merged multimodal processor for Mllama by @Isotr0py in #11427
Simplify logic of locating CUDART so file path by @houseroad in #13203
[Build] Automatically use the wheel of the base commit with Python-only build by @comaniac in #13178
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case by @LikeSundayLikeRain in #13097
[Frontend] Move CLI code into vllm.cmd package by @russellb in #12971
Allow Unsloth Dynamic 4bit BnB quants to work by @danielhanchen in #12974
[CI/Build] Allow ruff to auto-fix some issues by @russellb in #13180
[V1][core] Implement pipeline parallel on Ray by @ruisearch42 in #12996
[VLM] Remove input processor from clip and siglip by @Isotr0py in #13165
[Frontend] Pass pre-created socket to uvicorn by @russellb in #13113
[V1] Clarify input processing and multimodal feature caching logic by @ywang96 in #13211
[VLM] Merged multi-modal processor for Molmo by @DarkLight1337 in #12966
[V1][Core] Add worker_base for v1 worker by @AoyuQC in #12816
[Misc] Qwen2.5-VL Optimization by @wulipc in #13155
[VLM] Separate text-only and vision variants of the same model architecture by @DarkLight1337 in #13157
[Bugfix] Missing Content Type returns 500 Internal Server Error by @vaibhavjainwiz in #13193
[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint by @NickLucche in #12909
Add label if pre-commit passes by @hmellor in #12527
Optimize moe_align_block_size for deepseek_v3 by @mgoin in #12850
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels by @tlrmchlsmth in #13198
Revert "Add label if pre-commit passes" by @hmellor in #13242
[ROCm] Avoid using the default stream on ROCm as it is a performance killer by @gshtras in #13238
[Kernel] Fix awq error when n is not divisable by 128 by @jinzhen-lin in #13227
[V1] Consolidate MM cache size to vllm.envs by @ywang96 in #13239
[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on by @tlrmchlsmth in #13250
[Bugfix][CI] Inherit codespell settings from pyproject.toml in the pre-commit-config by @tlrmchlsmth in #13237
[Bugfix] Offline example of disaggregated prefill by @XiaobingSuper in #13214
[Misc] Remove redundant statements in scheduler.py by @WrRan in #13229
Consolidate Llama model usage in tests by @hmellor in #13094
Expand MLA to support most types of quantization by @mgoin in #13181
[V1] LoRA - Enable Serving Usecase by @varun-sundar-rabindranath in #12883
[ROCm][V1] Add intial ROCm support to V1 by @SageMoore in #12790
[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch by @imkero in #13126
[WIP] TPU V1 Support Refactored by @alexm-redhat in #13049
[Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch by @pooyadavoodi in #12927
[Bugfix] Fix missing parentheses by @xu-song in #13263
[Misc] Log time consumption of sleep and wake-up by @waltforme in #13115
[VLM] Keep track of whether prompt replacements have been applied by @DarkLight1337 in #13215
[V1] Simplify GPUModelRunner._update_states check by @njhill in #13265
Support logit_bias in v1 Sampler by @houseroad in #13079
[Core] choice-based structured output with xgrammar by @russellb in #12632
[Hardware][Gaudi][Bugfix] Fix error for guided decoding by @zhouyu5 in #12317
[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts by @mgoin in #13236
[Core] Reduce TTFT with concurrent partial prefills by @joerunde in #10235
[V1][Core] min_p sampling support by @AoyuQC in #13191
[V1][CI] Fix failed v1-test because of min_p by @WoosukKwon in #13316
[V1][Sampler] Don't apply temp for greedy-only by @njhill in #13311
[V1][PP] Fix memory profiling in PP by @WoosukKwon in #13315
[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm by @SageMoore in #13235
[Bugfix][Docs] Fix offline Whisper by @NickLucche in #13274
[Bugfix] Massage MLA's usage of flash attn for RoCM by @tlrmchlsmth in #13310
[BugFix] Don't scan entire cache dir when loading model by @njhill in #13302
[Bugfix]Fix search start_index of stop_checker by @xu-song in #13280
[Bugfix] Fix qwen2.5-vl image processor by @Isotr0py in #13286
[V1][Metrics] Add iteration_tokens_total histogram from V0 by @markmc in #13288
[AMD] [Model] DeepSeek tunings by @rasmith in #13199
[V1][PP] Run engine busy loop with batch queue by @comaniac in #13064
[ci/build] update flashinfer by @youkaichao in #13323
[Doc] [2/N] Add Fuyu E2E example for multimodal processor by @DarkLight1337 in #13331
[V1][Spec Decode] Ngram Spec Decode by @LiuXiaoxuanPKU in #12193
[Quant] Add SupportsQuant to phi3 and clip by @kylesayrs in #13104
[Bugfix] Pin xgrammar to 0.1.11 by @mgoin in #13338
[BugFix] Enhance test_pos_encoding to support execution on multi-devices by @wchen61 in #13187
[V1] Update doc and examples for H2O-VL by @ywang96 in #13349
[ci] skip failed tests for flashinfer by @youkaichao in #13352
[platform] add base class for communicators by @youkaichao in #13208
[Bugfix] Fix 2 Node and Spec Decode tests by @DarkLight1337 in #13341
[Docs] Change myenv to vllm. Update python_env_setup.inc.md by @arkylin in #13325
[V1][BugFix] Add init.py to v1/spec_decode/ by @WoosukKwon in #13359
[V1][PP] Cache Intermediate Tensors by @WoosukKwon in #13353
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case by @Isotr0py in #13358
[V1][BugFix] Clean up rejection sampler & Fix warning msg by @WoosukKwon in #13362
[V1][Misc] Avoid unnecessary log output by @jeejeelee in #13289
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode by @ShangmingCai in #12304
Fix spelling error in index.md by @yankooo in #13369
Run v1 benchmark and integrate with PyTorch OSS benchmark database by @huydhn in #13068
[MISC] tiny fixes by @MengqingCao in #13378
[VLM] Check required fields before initializing field config in DictEmbeddingItems by @DarkLight1337 in #13380
[Model] Support Mamba2 (Codestral Mamba) by @tlrmchlsmth in #9292
[Bugfix] fix xpu communicator by @yma11 in #13368
[Bugfix] Fix VLLM_USE_MODELSCOPE issue by @r4ntix in #13384
[V1] Get input tokens from scheduler by @WoosukKwon in #13339
[V1][PP] Fix intermediate tensor values by @comaniac in #13417
[V1][Spec decode] Move drafter to model runner by @WoosukKwon in #13363
[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue by @tlrmchlsmth in #13425
[Misc] Remove dangling references to SamplingType.BEAM by @hmellor in #13402
[Model] Enable quantization support for transformers backend by @Isotr0py in #12960
[ROCm] fix get_device_name for rocm by @divakar-amd in #13438
[v1] fix parallel config rank by @youkaichao in #13445
[Quant] Molmo SupportsQuant by @kylesayrs in #13336
[Quant] Arctic SupportsQuant by @kylesayrs in #13366
[Bugfix] Only print out chat template when supplied by @terrytangyuan in #13444
[core] fix sleep mode in pytorch 2.6 by @youkaichao in #13456
[Quant] Aria SupportsQuant by @kylesayrs in #13416
[V1][PP] Fix & Pin Ray version in requirements-cuda.txt by @WoosukKwon in #13436
Add outlines fallback when JSON schema has enum by @mgoin in #13449
[Bugfix] Ensure LoRA path from the request can be included in err msg by @terrytangyuan in #13450
[Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method by @Isotr0py in #13403
[Doc]: Improve feature tables by @hmellor in #13224
[Bugfix] Remove noisy error logging during local model loading by @Isotr0py in #13458
[ROCm] Make amdsmi import optional for other platforms by @DarkLight1337 in #13460
[Bugfix] Handle content type with optional parameters by @zifeitong in #13383
[Bugfix] Fix invalid rotary embedding unit test by @liangfu in #13431
[CI/Build] migrate static project metadata from setup.py to pyproject.toml by @dtrifiro in #8772
[V1][PP] Enable true PP with Ray executor by @WoosukKwon in #13472
[misc] fix debugging code by @youkaichao in #13487
[V1][Tests] Adding additional testing for multimodal models to V1 by @andoorve in #13308
[V1] Optimize handling of sampling metadata and req_ids list by @njhill in #13244
Pin Ray version to 2.40.0 by @WoosukKwon in #13490
[V1][Spec Decode] Optimize N-gram matching with Numba by @WoosukKwon in #13365
[Misc] Remove dangling references to --use-v2-block-manager by @hmellor in #13492
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch by @zhouyu5 in #12139
[perf-benchmark] Allow premerge ECR by @khluu in #13509
[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe by @divakar-amd in #13503
[Doc] Add clarification note regarding paligemma by @ywang96 in #13511
[1/n][CI] Load models in CI from S3 instead of HF by @khluu in #13205
[perf-benchmark] Fix ECR path for premerge benchmark by @khluu in #13512
Refactor GPUModelRunnerBase load_model method to include device param by @Zzhiter in #13037
[Bugfix] Fix Positive Feature Layers in Llava Models by @alex-jw-brooks in #13514
[Model][Speculative Decoding] DeepSeek MTP spec decode by @luccafong in #12755
[V1][Core] Generic mechanism for handling engine utility methods by @njhill in #13060
[Feature] Pluggable platform-specific scheduler by @yannicks1 in #13161
[CI/Build] force writing version file by @dtrifiro in #13544
[doc] clarify profiling is only for developers by @youkaichao in #13554
[VLM][Bugfix] Pass processor kwargs properly on init by @DarkLight1337 in #13516
[Bugfix] Fix device ordinal when initializing spec_decode_sampler under multi-node setup by @ShangmingCai in #13269
[doc] clarify multi-node serving doc by @youkaichao in #13558
Fix copyright year to auto get current year by @wilsonwu in #13561
[MISC] Logging the message about Ray teardown by @comaniac in #13502
[Misc] Avoid calling unnecessary hf_list_repo_files for local model path by @Isotr0py in #13348
[BugFix] Avoid error traceback in logs when V1 LLM terminates by @njhill in #13565
[3/n][CI] Load Quantization test models with S3 by @khluu in #13570
[Misc] Qwen2.5 VL support LoRA by @jeejeelee in #13261
[ci] Add AWS creds for AMD by @khluu in #13572
[ROCm][MoE] mi300 mixtral8x7B perf for specific BS by @divakar-amd in #13577
[core] add sleep and wake up endpoint and v1 support by @youkaichao in #12987
[bugfix] spec decode worker get tp group only when initialized by @simon-mo in #13578
[Misc] Warn if the vLLM version can't be retrieved by @alex-jw-brooks in #13501
[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL by @wulipc in #13533
[ROCm] MI300A compile targets deprecation by @gshtras in #13560
[API Server] Add port number range validation by @terrytangyuan in #13506
[CI/Build] Use uv in the Dockerfile by @mgoin in #13566
[ci] Fix spec decode test by @khluu in #13600
[2/n][ci] S3: Use full model path by @khluu in #13564
[Kernel] LoRA - Refactor sgmv kernels by @varun-sundar-rabindranath in #13110
Merge similar examples in offline_inference into single basic example by @hmellor in #12737
[Bugfix] Fix deepseekv3 grouped topk error by @Chen-XiaoBing in #13474

New Contributors

@jitseklomp made their first contribution in #12840
@fabianlim made their first contribution in #10909
@ZSL98 made their first contribution in #12824
@SzymonOzog made their first contribution in #12836
@DK-DARKmatter made their first contribution in #12896
@Shaoting-Feng made their first contribution in #12418
@SmartManoj made their first contribution in #13007
@farzadab made their first contribution in #12912
@je1lee made their first contribution in #13024
@MoonRide303 made their first contribution in #13029
@christian-pinto made their first contribution in #12830
@lingfanyu made their first contribution in #12921
@842974287 made their first contribution in #12923
@kaixih made their first contribution in #12784
@LikeSundayLikeRain made their first contribution in #13097
@danielhanchen made their first contribution in #12974
@AoyuQC made their first contribution in #12816
@wulipc made their first contribution in #13155
@vaibhavjainwiz made their first contribution in #13193
@xu-song made their first contribution in #13263
@zhouyu5 made their first contribution in #12317
@arkylin made their first contribution in #13325
@yankooo made their first contribution in #13369
@huydhn made their first contribution in #13068
@r4ntix made their first contribution in #13384
@Zzhiter made their first contribution in #13037
@luccafong made their first contribution in #12755
@wilsonwu made their first contribution in #13561
@Chen-XiaoBing made their first contribution in #13474

Full Changelog: v0.7.2...v0.7.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.3

Highlights

Model Support

Hardware Support

Engine Feature

Performance

Others

What's Changed

New Contributors

Contributors