-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Adding support for insertion of soft-tuned prompts #4645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level comments - if this is not time sensitive, I think it would be good if we could come up with some more generic APIs to promote code reuse between LoRA and prompt adapter
71adbbb
to
23f741b
Compare
Hey @Yard1,
After your review, happy to add more extensive tests + docs. |
@SwapnilDreams100 thanks! let me take a look |
Hi @Yard1 just a friendly reminder to review this PR when you get a chance, thanks! |
Hi @SwapnilDreams100, I have an initial implementation of adapter support for the OpenAI entry points based on https://github.com/SwapnilDreams100/vllm/tree/main. Would you be open to me contributing to your PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SwapnilDreams100 Actually I just realized this is only applied during prefill. In this case we do not need to do anything special for CUDA graph support. The current code looks fine and it should be decently performant. We should just add a comment to say that this will not work with CUDA graphs.
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving pending CI passing
@SwapnilDreams100 first of all congrats! I'd like to add tests for the openai server, would it be best to wait for this to merge so I can open my own PR against main? |
Sounds good @g-eoj, big thank you for your help! |
Hey @Yard1 are we good to merge? |
Thanks for this epic effort @SwapnilDreams100!! And big thanks to @Yard1 for the many detailed reviews. I'll merge it before any new conflicts pop up! |
Big thank you to @Yard1 for your guidance on this, this was a great learning experience! |
…ct#4645) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
…ct#4645) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
…ct#4645) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
This PR adds support for inserting soft-tuned prompts into the input embeddings (trained using PEFT).
This functionality is required by the IBM team.
Summary of Changes:
prompt_adapter
folder similar to thelora
folder to create a LRU cache management system for multiple prompt adaptersprompt_adapter_config
engine parameter - for easy extension to more sophisticated prompt-tuning techniques in the futureprompt_adapter_request
parameter added to the generate functionality of the LLM_Enginebloom, llama, mistral
, easily extensible for othersbloom