Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Adding support for insertion of soft-tuned prompts #4645

Merged
merged 91 commits into from
Jul 9, 2024

Conversation

SwapnilDreams100
Copy link
Contributor

@SwapnilDreams100 SwapnilDreams100 commented May 7, 2024

This PR adds support for inserting soft-tuned prompts into the input embeddings (trained using PEFT).

This functionality is required by the IBM team.

Summary of Changes:

  • New prompt_adapter folder similar to the lora folder to create a LRU cache management system for multiple prompt adapters
  • New prompt_adapter_config engine parameter - for easy extension to more sophisticated prompt-tuning techniques in the future
  • New prompt_adapter_request parameter added to the generate functionality of the LLM_Engine
  • Current support for several models like bloom, llama, mistral, easily extensible for others
  • Simple test demonstrating that prompt adapters work for bloom
  • Some parameter documentation still pending, will complete post successful code review

@Yard1 Yard1 self-requested a review May 7, 2024 16:56
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level comments - if this is not time sensitive, I think it would be good if we could come up with some more generic APIs to promote code reuse between LoRA and prompt adapter

vllm/model_executor/models/baichuan.py Outdated Show resolved Hide resolved
vllm/model_executor/models/baichuan.py Outdated Show resolved Hide resolved
vllm/prompt_adapter/models.py Outdated Show resolved Hide resolved
@SwapnilDreams100 SwapnilDreams100 force-pushed the main branch 3 times, most recently from 71adbbb to 23f741b Compare June 3, 2024 11:02
@SwapnilDreams100
Copy link
Contributor Author

Hey @Yard1,
I have addressed your comments on this soft prompt tuning PR. Some updates:

  • New adapter_commons folder with all the common code between LoRA and Prompt Adapters abstracted
  • Reduced redundancy of having to add the adapter addition code in all model classes by extending the VocalParallelEmbedding layer
  • New parameter enable_prompt_adapter for more consistency w LoRA
  • Added testing for multi-adapter inference

After your review, happy to add more extensive tests + docs.

@SwapnilDreams100 SwapnilDreams100 requested a review from Yard1 June 5, 2024 20:38
@Yard1
Copy link
Collaborator

Yard1 commented Jun 5, 2024

@SwapnilDreams100 thanks! let me take a look

@SwapnilDreams100
Copy link
Contributor Author

SwapnilDreams100 commented Jun 10, 2024

Hi @Yard1 just a friendly reminder to review this PR when you get a chance, thanks!
Once this design is approved, happy to update this with support for prefix tuning as well, which should be similar in design!

@g-eoj
Copy link
Contributor

g-eoj commented Jun 12, 2024

Hi @SwapnilDreams100, I have an initial implementation of adapter support for the OpenAI entry points based on https://github.com/SwapnilDreams100/vllm/tree/main. Would you be open to me contributing to your PR?

Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SwapnilDreams100 Actually I just realized this is only applied during prefill. In this case we do not need to do anything special for CUDA graph support. The current code looks fine and it should be decently performant. We should just add a comment to say that this will not work with CUDA graphs.

vllm/prompt_adapter/models.py Outdated Show resolved Hide resolved
SwapnilDreams100 and others added 4 commits July 8, 2024 15:05
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving pending CI passing

@g-eoj
Copy link
Contributor

g-eoj commented Jul 9, 2024

@SwapnilDreams100 first of all congrats!

I'd like to add tests for the openai server, would it be best to wait for this to merge so I can open my own PR against main?

@SwapnilDreams100
Copy link
Contributor Author

Sounds good @g-eoj, big thank you for your help!

@SwapnilDreams100
Copy link
Contributor Author

Hey @Yard1 are we good to merge?

@njhill
Copy link
Member

njhill commented Jul 9, 2024

Thanks for this epic effort @SwapnilDreams100!! And big thanks to @Yard1 for the many detailed reviews.

I'll merge it before any new conflicts pop up!

@njhill njhill merged commit 4d6ada9 into vllm-project:main Jul 9, 2024
70 checks passed
@SwapnilDreams100
Copy link
Contributor Author

Big thank you to @Yard1 for your guidance on this, this was a great learning experience!

adityagoel14 pushed a commit to adityagoel14/vllm-torchrun-test that referenced this pull request Jul 10, 2024
…ct#4645)

Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
(cherry picked from commit 4d6ada9)
@simon-mo simon-mo mentioned this pull request Jul 15, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024
…ct#4645)

Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
…ct#4645)

Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
…ct#4645)

Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants