-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for LoRA on vLLM #10009
Merged
Merged
Add support for LoRA on vLLM #10009
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ning deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
…pt to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
…F format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
oyilmaz-nvidia
approved these changes
Aug 21, 2024
BoxiangW
pushed a commit
to BoxiangW/NeMo
that referenced
this pull request
Sep 4, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
adityavavre
pushed a commit
to adityavavre/NeMo
that referenced
this pull request
Sep 15, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: adityavavre <aditya.vavre@gmail.com>
monica-sekoyan
pushed a commit
that referenced
this pull request
Oct 14, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
tomlifu
pushed a commit
to tomlifu/NeMo
that referenced
this pull request
Oct 25, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu
pushed a commit
to tomlifu/NeMo
that referenced
this pull request
Oct 25, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv
pushed a commit
to hainan-xv/NeMo
that referenced
this pull request
Nov 5, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds support for using LoRA adapters on checkpoints exported to vLLM.
Collection: NLP
Changelog
convert_nemo_to_canonical.py
script to a reusable moduleUsage
python deploy_vllm_triton.py -nc /path/to/checkpoint.nemo -lc /path/to/lora.nemo -tmn TEST ... python query.py -mn TEST -p "Prompt text" -lt 0
PR Type: