You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model: Llama-2-chat-hf
The current implementation of vLLM gives the finish_reason as 'length' whilst the native model supports context length of 4024(And works well with the context we've tested it with) , Is the option available to change the native context length supported by the vLLM instance?
I've retried the experiments with the latest release and the issue still persists.
The text was updated successfully, but these errors were encountered:
YaRN model support was merged into 0.2.1 already. There have been recently merged PRs like #1510 about longer context length support. It is just not well documented yet.
I tested YaRN models up to 25k model length before and they worked well.
Depending on your use case you may be able to use models like Code Llama (16k) or Mistral (8k). I tested Code Llama up to the full 16k and it worked well, even for reasoning.
Model: Llama-2-chat-hf
The current implementation of vLLM gives the finish_reason as 'length' whilst the native model supports context length of 4024(And works well with the context we've tested it with) , Is the option available to change the native context length supported by the vLLM instance?
I've retried the experiments with the latest release and the issue still persists.
The text was updated successfully, but these errors were encountered: