-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Warn if user max_model_len
is greater than derived max_model_len
#7080
Conversation
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fialhocoelho! Just have some minor rewording suggestions.
Some text nits Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Some text nits Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fialhocoelho
@fialhocoelho could you merge in the main branch again, it should help with the CI test failures. |
@njhill sure |
…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>
…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>
…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Explanation of Changes
Summary
This update addresses the handling of user-specified
MAX_SEQUENCE_LENGTH
values that exceed the model's maximum sequence length. The behavior is controlled by the environment variableVLLM_ALLOW_LONG_MAX_MODEL_LEN
, which defaults to0
.@njhill @Yard1 @robertgshaw2-neuralmagic @maxdebayser I mistakenly closed the other PR while syncing with the main. Here is the link to the previous one: #5911 (to maintain comment traceability).
Motivation
Previously, an error was triggered when user
max_model_len
exceeded the derived value, potentially leading to unintended behavior or CUDA errors. By changing this to a warning, users are alerted without halting execution, allowing flexibility depending on their needs.Notes
VLLM_ALLOW_LONG_MAX_MODEL_LEN=0
):MAX_SEQUENCE_LENGTH
is greater than the model's maximum sequence length, an error will be raised. This prevents potential incorrect model outputs or CUDA errors that could arise from exceeding the model's capabilities.VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
):VLLM_ALLOW_LONG_MAX_MODEL_LEN
is set to1
, the system will allow the user-specifiedMAX_SEQUENCE_LENGTH
to exceed the model's maximum. Instead of raising an error, a warning message will be logged, advising users to ensure the value is correct and within the model's context size. This provides flexibility for advanced users who understand the risks and need to override the default limit.