[CORE] Adding support for insertion of soft-tuned prompts #4645

SwapnilDreams100 · 2024-05-07T05:46:57Z

This PR adds support for inserting soft-tuned prompts into the input embeddings (trained using PEFT).

This functionality is required by the IBM team.

Summary of Changes:

New prompt_adapter folder similar to the lora folder to create a LRU cache management system for multiple prompt adapters
New prompt_adapter_config engine parameter - for easy extension to more sophisticated prompt-tuning techniques in the future
New prompt_adapter_request parameter added to the generate functionality of the LLM_Engine
Current support for several models like bloom, llama, mistral, easily extensible for others
Simple test demonstrating that prompt adapters work for bloom
Some parameter documentation still pending, will complete post successful code review

Yard1

Some high level comments - if this is not time sensitive, I think it would be good if we could come up with some more generic APIs to promote code reuse between LoRA and prompt adapter

vllm/model_executor/models/baichuan.py

vllm/prompt_adapter/models.py

SwapnilDreams100 · 2024-06-05T16:55:27Z

Hey @Yard1,
I have addressed your comments on this soft prompt tuning PR. Some updates:

New adapter_commons folder with all the common code between LoRA and Prompt Adapters abstracted
Reduced redundancy of having to add the adapter addition code in all model classes by extending the VocalParallelEmbedding layer
New parameter enable_prompt_adapter for more consistency w LoRA
Added testing for multi-adapter inference

After your review, happy to add more extensive tests + docs.

Yard1 · 2024-06-05T23:46:23Z

@SwapnilDreams100 thanks! let me take a look

SwapnilDreams100 · 2024-06-10T20:56:16Z

Hi @Yard1 just a friendly reminder to review this PR when you get a chance, thanks!
Once this design is approved, happy to update this with support for prefix tuning as well, which should be similar in design!

g-eoj · 2024-06-12T19:49:03Z

Hi @SwapnilDreams100, I have an initial implementation of adapter support for the OpenAI entry points based on https://github.com/SwapnilDreams100/vllm/tree/main. Would you be open to me contributing to your PR?

Yard1

@SwapnilDreams100 Actually I just realized this is only applied during prefill. In this case we do not need to do anything special for CUDA graph support. The current code looks fine and it should be decently performant. We should just add a comment to say that this will not work with CUDA graphs.

vllm/prompt_adapter/models.py

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1

Approving pending CI passing

Add openai entrypoint

g-eoj · 2024-07-09T12:49:55Z

@SwapnilDreams100 first of all congrats!

I'd like to add tests for the openai server, would it be best to wait for this to merge so I can open my own PR against main?

SwapnilDreams100 · 2024-07-09T13:03:56Z

Sounds good @g-eoj, big thank you for your help!

SwapnilDreams100 · 2024-07-09T20:22:15Z

Hey @Yard1 are we good to merge?

njhill · 2024-07-09T20:26:16Z

Thanks for this epic effort @SwapnilDreams100!! And big thanks to @Yard1 for the many detailed reviews.

I'll merge it before any new conflicts pop up!

SwapnilDreams100 · 2024-07-09T20:54:19Z

Big thank you to @Yard1 for your guidance on this, this was a great learning experience!

…ct#4645) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> (cherry picked from commit 4d6ada9)

…ct#4645) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 self-requested a review May 7, 2024 16:56

Yard1 reviewed May 7, 2024

View reviewed changes

vllm/model_executor/models/baichuan.py Outdated Show resolved Hide resolved

vllm/model_executor/models/baichuan.py Outdated Show resolved Hide resolved

vllm/prompt_adapter/models.py Outdated Show resolved Hide resolved

SwapnilDreams100 force-pushed the main branch 3 times, most recently from 71adbbb to 23f741b Compare June 3, 2024 11:02

SwapnilDreams100 requested a review from Yard1 June 5, 2024 20:38

Swapnil Parekh added 20 commits June 13, 2024 12:59

soft prompt support

04f262e

Run yapf and ruff

96b4a1a

Multimodal fix

3131273

correctness update

e9ff38b

formatting

9f0a8ae

formatting

c2937d1

reverting to hasattr

e43e89b

adapter commons fix

a2b4fc3

minor fixes

3ebee19

formatting

629a684

reset_adapter

a3ad6ac

bugfix

dcd7e88

reset_adapter fix

647a32d

peft dependencies

90d170c

fixing llava bug

0fca895

typing fix

d4e531c

async engine update

b7f8256

batchwise processing

449d988

formatting

f28b66e

formatting yapf

220deef

Yard1 reviewed Jul 8, 2024

View reviewed changes

vllm/prompt_adapter/models.py Outdated Show resolved Hide resolved

SwapnilDreams100 and others added 4 commits July 8, 2024 15:05

Update tests/prompt_adapter/test_bloom.py

50514c3

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

Update vllm/prompt_adapter/models.py

1217964

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

formatting

f9a5b4a

formatting

8545205

Yard1 approved these changes Jul 8, 2024

View reviewed changes

Swapnil Parekh and others added 8 commits July 8, 2024 17:48

formatting

2d5c246

docs update

3da2777

Merge pull request #2 from g-eoj/openai-entrypoint

9634b9d

Add openai entrypoint

formatting

8279496

formatting

4336df1

quick openapi fix

77183d7

formatting

dd887f8

formatting

67a9f17

njhill merged commit 4d6ada9 into vllm-project:main Jul 9, 2024
70 checks passed

This was referenced Jul 10, 2024

[Bugfix][TPU] Add prompt adapter methods to TPUExecutor #6279

Merged

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

Merged

g-eoj mentioned this pull request Jul 13, 2024

[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests #6419

Merged

simon-mo mentioned this pull request Jul 15, 2024

bump version to v0.5.2 #6433

Merged

mawong-amd mentioned this pull request Jul 18, 2024

[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes #6543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] Adding support for insertion of soft-tuned prompts #4645

[CORE] Adding support for insertion of soft-tuned prompts #4645

SwapnilDreams100 commented May 7, 2024 •

edited

Loading

Yard1 left a comment

SwapnilDreams100 commented Jun 5, 2024

Yard1 commented Jun 5, 2024

SwapnilDreams100 commented Jun 10, 2024 •

edited

Loading

g-eoj commented Jun 12, 2024

Yard1 left a comment

Yard1 left a comment

g-eoj commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

njhill commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

[CORE] Adding support for insertion of soft-tuned prompts #4645

[CORE] Adding support for insertion of soft-tuned prompts #4645

Conversation

SwapnilDreams100 commented May 7, 2024 • edited Loading

Yard1 left a comment

Choose a reason for hiding this comment

SwapnilDreams100 commented Jun 5, 2024

Yard1 commented Jun 5, 2024

SwapnilDreams100 commented Jun 10, 2024 • edited Loading

g-eoj commented Jun 12, 2024

Yard1 left a comment

Choose a reason for hiding this comment

Yard1 left a comment

Choose a reason for hiding this comment

g-eoj commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

njhill commented Jul 9, 2024

SwapnilDreams100 commented Jul 9, 2024

SwapnilDreams100 commented May 7, 2024 •

edited

Loading

SwapnilDreams100 commented Jun 10, 2024 •

edited

Loading