Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for longer context lengths. #1108

Closed
bhumik1310 opened this issue Sep 20, 2023 · 1 comment
Closed

No support for longer context lengths. #1108

bhumik1310 opened this issue Sep 20, 2023 · 1 comment

Comments

@bhumik1310
Copy link

Model: Llama-2-chat-hf
The current implementation of vLLM gives the finish_reason as 'length' whilst the native model supports context length of 4024(And works well with the context we've tested it with) , Is the option available to change the native context length supported by the vLLM instance?

I've retried the experiments with the latest release and the issue still persists.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Nov 3, 2023

YaRN model support was merged into 0.2.1 already. There have been recently merged PRs like #1510 about longer context length support. It is just not well documented yet.

Try to use YaRN models from https://huggingface.co/NousResearch

I tested YaRN models up to 25k model length before and they worked well.

Depending on your use case you may be able to use models like Code Llama (16k) or Mistral (8k). I tested Code Llama up to the full 16k and it worked well, even for reasoning.

@hmellor hmellor closed this as completed Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants