[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` #7080

fialhocoelho · 2024-08-02T14:40:04Z

Explanation of Changes

Summary

This update addresses the handling of user-specified MAX_SEQUENCE_LENGTH values that exceed the model's maximum sequence length. The behavior is controlled by the environment variable VLLM_ALLOW_LONG_MAX_MODEL_LEN, which defaults to 0.

@njhill @Yard1 @robertgshaw2-neuralmagic @maxdebayser I mistakenly closed the other PR while syncing with the main. Here is the link to the previous one: #5911 (to maintain comment traceability).

Motivation

Previously, an error was triggered when user max_model_len exceeded the derived value, potentially leading to unintended behavior or CUDA errors. By changing this to a warning, users are alerted without halting execution, allowing flexibility depending on their needs.

Notes

Adjusted from f-string interpolation to % formatting for logging, in line with best practices (referencing W1203).
The change ensures compatibility with environments where logging-format-interpolation may be disabled, maintaining robustness across different configurations.
Default Behavior (VLLM_ALLOW_LONG_MAX_MODEL_LEN=0):
- If the user-specified MAX_SEQUENCE_LENGTH is greater than the model's maximum sequence length, an error will be raised. This prevents potential incorrect model outputs or CUDA errors that could arise from exceeding the model's capabilities.
Optional Behavior (VLLM_ALLOW_LONG_MAX_MODEL_LEN=1):
- If VLLM_ALLOW_LONG_MAX_MODEL_LEN is set to 1, the system will allow the user-specified MAX_SEQUENCE_LENGTH to exceed the model's maximum. Instead of raising an error, a warning message will be logged, advising users to ensure the value is correct and within the model's context size. This provides flexibility for advanced users who understand the risks and need to override the default limit.

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

github-actions · 2024-08-02T14:40:14Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

njhill

Thanks @fialhocoelho! Just have some minor rewording suggestions.

vllm/config.py

vllm/envs.py

Some text nits Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

njhill

Thanks @fialhocoelho

njhill · 2024-08-03T06:42:26Z

@fialhocoelho could you merge in the main branch again, it should help with the CI test failures.

fialhocoelho · 2024-08-03T11:05:21Z

@njhill sure

…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>

…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>

…model_len` (vllm-project#7080) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fialhocoelho added 2 commits August 2, 2024 11:25

add env var VLLM_ALLOW_LONG_MAX_MODEL_LEN to check override/warning

ce98c1c

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

add env var VLLM_ALLOW_LONG_MAX_MODEL_LEN to envs lib

d07870f

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

njhill reviewed Aug 2, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/envs.py Outdated Show resolved Hide resolved

fialhocoelho and others added 4 commits August 2, 2024 12:02

Update vllm/config.py

0d1b9fb

Some text nits Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Update vllm/envs.py

4d98868

Some text nits Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Solving E501 Line too long

341785b

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

Solving yapf (3.11) tips

fe91265

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

njhill approved these changes Aug 2, 2024

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 3, 2024

Merge branch 'vllm-project:main' into warning-max-seq-len

67539c6

njhill merged commit 825b044 into vllm-project:main Aug 3, 2024
66 checks passed

dtrifiro mentioned this pull request Aug 5, 2024

Sync with upstream@v0.5.4-7-g9118217f opendatahub-io/vllm#120

Closed

tjohnson31415 mentioned this pull request Aug 8, 2024

Allow long max seq length by default opendatahub-io/vllm#125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` #7080

[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` #7080

fialhocoelho commented Aug 2, 2024

github-actions bot commented Aug 2, 2024

njhill left a comment

njhill left a comment

njhill commented Aug 3, 2024

fialhocoelho commented Aug 3, 2024 •

edited

Loading

[Frontend] Warn if user max_model_len is greater than derived max_model_len #7080

[Frontend] Warn if user max_model_len is greater than derived max_model_len #7080

Conversation

fialhocoelho commented Aug 2, 2024

Explanation of Changes

Summary

Motivation

Notes

github-actions bot commented Aug 2, 2024

njhill left a comment

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

njhill commented Aug 3, 2024

fialhocoelho commented Aug 3, 2024 • edited Loading

[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` #7080

[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` #7080

fialhocoelho commented Aug 3, 2024 •

edited

Loading