You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note on multiple instances inference:
In vllm inference, the number of attn heads should be devisible by vllm tensor parallel size. If we have a 14 heads LLM, then the options for tp is 1&2 (7 will cause another division issue, but I just forget what that issue is).
Say we have 8 gpus, then to utilize these devices, multiple instances vllm inference is necessary (tp=1 -> 8 instances, and tp=2 -> 4 instances)
Also, same for rm inference, and any other inference pipelines.
This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗
Main Features
chatbot.py
upgrade Conversation_template #917Usability
vllm
package optionalhf_model_mixin
Bug fixes
model.generate()
with dsz3 [BUG] The text cannot be generated successfully during the Raft step #861merge_lora
lora with abs path mergingload_dataset
long data fix [Bug Fix] update load_dataset to support long data #878create_copied_dataclass
compatibility when python version >= 3.10 (kw_only
issue) [BUG]TypeError: Field.__init__() missing 1 required positional argument: 'kw_only' #903 [usability] deps streamlining #905Issues left over from history
use_accelerator
->use_accelerate
typo fix (with Accelerate support PR)model_args.use_lora
leads to truncation of the sequence, mentioned in [Feature] reward model inferencer and dpov2 aligner #867Documentation
The text was updated successfully, but these errors were encountered: