Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LoRA on vLLM #10009

Merged
merged 14 commits into from
Aug 30, 2024
Merged

Add support for LoRA on vLLM #10009

merged 14 commits into from
Aug 30, 2024

Conversation

apanteleev
Copy link
Contributor

What does this PR do ?

Adds support for using LoRA adapters on checkpoints exported to vLLM.

Collection: NLP

Changelog

  • Moved the LoRA conversion logic from the convert_nemo_to_canonical.py script to a reusable module
  • Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM
  • Added support for enabling LoRAs on vLLM with automatic max rank detection
  • Fixed the logger initialization in the vLLM deployment script

Usage

python deploy_vllm_triton.py -nc /path/to/checkpoint.nemo -lc /path/to/lora.nemo -tmn TEST ...
python query.py -mn TEST -p "Prompt text" -lt 0

PR Type:

  • New Feature

…ning deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
…pt to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
…F format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
@ko3n1g ko3n1g added the Run CICD label Aug 1, 2024
apanteleev and others added 4 commits August 5, 2024 11:14
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 20, 2024
@github-actions github-actions bot removed the stale label Aug 22, 2024
@oyilmaz-nvidia oyilmaz-nvidia merged commit d886151 into NVIDIA:main Aug 30, 2024
128 of 129 checks passed
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Sep 4, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
adityavavre pushed a commit to adityavavre/NeMo that referenced this pull request Sep 15, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: adityavavre <aditya.vavre@gmail.com>
@apanteleev apanteleev deleted the vllm-lora branch September 19, 2024 00:15
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Added logger initialization, improved some messages.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Moved the LoRA converter script to nemo.export.utils.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Fixed the description of the query.py script.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

* Fixed the missing file close.

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

---------

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>
Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: apanteleev <apanteleev@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants