Add support for LoRA on vLLM #10009

apanteleev · 2024-08-01T18:20:31Z

What does this PR do ?

Adds support for using LoRA adapters on checkpoints exported to vLLM.

Collection: NLP

Changelog

Moved the LoRA conversion logic from the convert_nemo_to_canonical.py script to a reusable module
Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM
Added support for enabling LoRAs on vLLM with automatic max rank detection
Fixed the logger initialization in the vLLM deployment script

Usage

python deploy_vllm_triton.py -nc /path/to/checkpoint.nemo -lc /path/to/lora.nemo -tmn TEST ...
python query.py -mn TEST -p "Prompt text" -lt 0

PR Type:

New Feature

…ning deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

…pt to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

…F format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

nemo/export/utils/lora_converter.py

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

github-actions · 2024-08-20T01:51:04Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>

* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: adityavavre <aditya.vavre@gmail.com>

* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>

* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>

* Added basic support for adding LoRA checkpoints in HF format when running deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the conversion logic from the convert_nemo_to_canonical.py script to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Added logger initialization, improved some messages. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Moved the LoRA converter script to nemo.export.utils. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Fixed the description of the query.py script. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Apply isort and black reformatting Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> * Fixed the missing file close. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Signed-off-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: apanteleev <apanteleev@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>

apanteleev added 5 commits August 1, 2024 11:11

Added basic support for adding LoRA checkpoints in HF format when run…

dc7a971

…ning deploy_vllm_triton.py Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Moved the conversion logic from the convert_nemo_to_canonical.py scri…

2eab9e5

…pt to a reusable module, removed the tar unpacking, removed the dependencies on OmegaConf and NLPSaveRestoreConnector. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Implemented on-load conversion of Nemo format LoRA checkpoints into H…

5b9ae6c

…F format for vLLM. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Added logger initialization, improved some messages.

388435c

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Moved the LoRA converter script to nemo.export.utils.

2d7ae5f

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

github-advanced-security bot found potential problems Aug 1, 2024

View reviewed changes

nemo/export/utils/lora_converter.py Fixed Show fixed Hide fixed

ko3n1g added the Run CICD label Aug 1, 2024

apanteleev and others added 4 commits August 5, 2024 11:14

Fixed the description of the query.py script.

efffb27

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Apply isort and black reformatting

3527f08

Signed-off-by: apanteleev <apanteleev@users.noreply.github.com>

Merge remote-tracking branch 'github/main' into vllm-lora

2f9a151

Fixed the missing file close.

4608c2c

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

apanteleev force-pushed the vllm-lora branch from 88ba019 to 4608c2c Compare August 5, 2024 18:16

github-actions bot added the stale label Aug 20, 2024

oyilmaz-nvidia approved these changes Aug 21, 2024

View reviewed changes

Merge branch 'main' into vllm-lora

a68b9d7

github-actions bot removed the stale label Aug 22, 2024

oyilmaz-nvidia and others added 4 commits August 22, 2024 17:00

Merge branch 'main' into vllm-lora

9d2ab6d

Merge branch 'main' into vllm-lora

e678137

Merge branch 'main' into vllm-lora

a319d02

Merge branch 'main' into vllm-lora

c522d1d

ericharper added Run CICD and removed Run CICD labels Aug 27, 2024

oyilmaz-nvidia merged commit d886151 into NVIDIA:main Aug 30, 2024
128 of 129 checks passed

apanteleev deleted the vllm-lora branch September 19, 2024 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for LoRA on vLLM #10009

Add support for LoRA on vLLM #10009

apanteleev commented Aug 1, 2024

github-actions bot commented Aug 20, 2024

Add support for LoRA on vLLM #10009

Add support for LoRA on vLLM #10009

Conversation

apanteleev commented Aug 1, 2024

What does this PR do ?

Changelog

Usage

github-actions bot commented Aug 20, 2024