Release v0.1.17
Highlights
- Add data parallelim #480
- Add speculative execution for OpenAI API #250
- Update vllm to v0.4.3 for new quantization features #511
- Better error handling (#457, #449, #514)
What's Changed
- [Feat] Add llava qwen, llava mistral by @kcz358 in #419
- Format code by @hnyls2002 in #441
- Add finish_reason to OpenAI API by @mgerstgrasser in #446
- Simplify port allocation by @merrymercy in #447
- Add PUT for generate api by @Ying1123 in #448
- Improve error handling & abort disconnected requests by @merrymercy in #449
- Fix the broken
--disable-radix-cache
by @hnyls2002 in #451 - openai chat speculative execution by @ChuyueSun in #250
- Fix openai speculative execution by @Ying1123 in #456
- Abort disconnected requests by @merrymercy in #457
- Rename api_num_spec_tokens -> num_api_spec_tokens by @merrymercy in #458
- Use model loader from vllm by @merrymercy in #459
- port fp8 mixtral by @merrymercy in #460
- fix test bug in srt_llava_next_test.py by @bingwork in #470
- Add the instruction link to the LLaVA-NeXT-Video at README by @ZhangYuanhan-AI in #463
- Improve logging & add logit cap by @merrymercy in #471
- Optimize retract by @hnyls2002 in #440
- Add benchmark scripts by @Ying1123 in #476
- [Feat/Fix] Refactoring Llava models into single file by @Luodian in #475
- Improve benchmark scripts & rename some scripts by @merrymercy in #477
- Improve benchmark scripts & add more models by @merrymercy in #484
- Support data parallelism (static) by @Ying1123 in #480
- Make the server random by default by @merrymercy in #488
- Revert "Make the server random by default" by @Ying1123 in #492
- update the script: examples/usage/llava_video/srt_example_llava_v.sh by @ZhangYuanhan-AI in #491
- Make the server random by default by @merrymercy in #493
- Update vllm to v0.4.3 by @merrymercy in #511
- remove redundant pad_input_ids function by @amosyou in #500
- Litellm Backend by @huyiwen in #502
- Fix rid state map leak + Refractor .finished by @Qubitium in #505
- Crash the server when error or OOM happens by @merrymercy in #514
- Update version to 0.1.17 by @merrymercy in #515
New Contributors
- @kcz358 made their first contribution in #419
- @mgerstgrasser made their first contribution in #446
- @bingwork made their first contribution in #470
- @amosyou made their first contribution in #500
- @huyiwen made their first contribution in #502
Full Changelog: v0.1.16...v0.1.17