[BugFix] skip loading lm_head for llama if word embeddings are tied #6796

prashantgupta24 · 2024-07-25T19:32:51Z

Check if tie_word_embeddings is true, that means lm_head is tied to embeddings we should be able to skip loading lm_head

Similar to what was done for gemma here - #3553

Since lm_head is tied to embeddings we should be able to skip loading lm_head Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

github-actions · 2024-07-25T19:33:01Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

DarkLight1337 · 2024-07-26T12:00:14Z

Seems that it has already been done in #6758. Would you mind if the other PR is adopted instead?

prashantgupta24 · 2024-07-26T17:30:14Z

Closing in favor of #6758

🐛 skip loading lm_head.weight

4f44925

Since lm_head is tied to embeddings we should be able to skip loading lm_head Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 added 2 commits July 25, 2024 12:36

🐛 make sure to check tie_word_embeddings config first

0f58672

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 use single if

ff663f6

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 changed the title ~~🐛 skip loading lm_head~~ [BugFix] skip loading lm_head for llama if word embeddings are tied Jul 25, 2024

DarkLight1337 requested a review from simon-mo July 26, 2024 11:59

prashantgupta24 closed this Jul 26, 2024

prashantgupta24 deleted the lm-head-llama branch August 20, 2024 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] skip loading lm_head for llama if word embeddings are tied #6796

[BugFix] skip loading lm_head for llama if word embeddings are tied #6796

prashantgupta24 commented Jul 25, 2024 •

edited

Loading

github-actions bot commented Jul 25, 2024

DarkLight1337 commented Jul 26, 2024

prashantgupta24 commented Jul 26, 2024

[BugFix] skip loading lm_head for llama if word embeddings are tied #6796

[BugFix] skip loading lm_head for llama if word embeddings are tied #6796

Conversation

prashantgupta24 commented Jul 25, 2024 • edited Loading

github-actions bot commented Jul 25, 2024

DarkLight1337 commented Jul 26, 2024

prashantgupta24 commented Jul 26, 2024

prashantgupta24 commented Jul 25, 2024 •

edited

Loading