Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in tying OPT embeddings #1

Merged
merged 1 commit into from
Feb 25, 2023
Merged

Fix a bug in tying OPT embeddings #1

merged 1 commit into from
Feb 25, 2023

Conversation

WoosukKwon
Copy link
Collaborator

This PR fixes a bug in supporting OPT-350m/OPT-6.7b/OPT-13b and OPT-IML models.

The bug happened because our model code didn't include some methods that were required to tie the input and output embeddings.

@WoosukKwon WoosukKwon merged commit cbf8779 into main Feb 25, 2023
@WoosukKwon WoosukKwon deleted the fix-opt branch February 25, 2023 00:29
CZT0 referenced this pull request in semedia-tech/vllm Sep 11, 2023
orangetin referenced this pull request in togethercomputer/vllm-ttgi Sep 14, 2023
add rope scaling as a cli arg so openai server can load rope scaled models
xiangyuT pushed a commit to xiangyuT/vllm that referenced this pull request Oct 18, 2023
bigPYJ1151 added a commit to bigPYJ1151/vllm that referenced this pull request Oct 30, 2023
l1cacheDell added a commit to CaspianFang/vllm that referenced this pull request Nov 15, 2023
hongxiayang referenced this pull request in hongxiayang/vllm Feb 13, 2024
ilya-lavrenov referenced this pull request in ilya-lavrenov/vllm Feb 19, 2024
Deterministic OpenVINO inference
Spycsh pushed a commit to Spycsh/vllm that referenced this pull request Feb 27, 2024
* Porting vllm to HPU

* add hpu cache allocate

* move slot_mapping to cpu and add is_prompt in cache_ops.reshape_and_cache

* add bucket to input metadata

* 1. limit max block number for lazy mode (TODO)
2. set some inpu metadata from cuda to cpu

* remove bucket for block tables

* add run bash script and change benchmark config

* 1. modify kv cache structure to tensors
2. update hpu paged attention API (for hpu graph compatibility)

* add attention mask for generation

* add multi_query_kv_attention attn_bias

* Temp commit

* Integrate fused kernels for RMSNorm and RoPE

* Resolve merge conflicts

* Minor Gaudi workarounds, add debugging to stock vLLM API server

* Fix post-merge pinned memory segfaults

* Re-enable sequence decode

* Maintain GPU compatibility in cache_engine

* Adjust HPU RoPE for non-query runs

* Integrate HPU primitive implementations

* Add xops bindings

* Cast paged attention inputs to bfloat16

* Remove leftover debug calls

* Update comments on HPU ops

* Restoring NVIDIA compatibility in setup.py

* vllm.hpu cleanup

* Added HPU-specific requirements

* Restored full functionality on NVIDIA

* vllm.core cleanup

* vllm init cleanup

* vllm.hpu cleanup

* vllm.benchmarks cleanup

* vllm.entrypoint cleanup

* Changed is_hpu logic

* vllm.benchmark cleanup

* Fixed importing condition

* tests cleanup

* removed dummy printings

* Update test_api_server.py

* restored attention and logprobs tests functionality on Nvidia

* throughput benchmark cleanup

* Changed Habana copyright header

* Restored alibi in bloom

* Added BSD license header

---------

Co-authored-by: Xiaotong Chen <xchen@habana.ai>
Co-authored-by: Jinyan Chen <jychen@habana.ai>
Co-authored-by: Mikhail Dvoretckii <mdvoretckii@habana.ai>
Co-authored-by: Sebastian Urwan <surwan@habana.ai>
mujjingun added a commit to gmlwns2000/vllm-timber that referenced this pull request Apr 15, 2024
mzusman pushed a commit to mzusman/vllm that referenced this pull request Apr 16, 2024
BA-78554: Jurassic 2.5

* worked on jurasic2.5 configuration file, updated jurassic2_5 modeling file to support alternating experts/attn layers

* finished working the forward pass of jurassic3.py

* finished working the forward pass of jurassic3.py

* finished working the forward pass of jurassic3.py

* jurassic_3 modeling file works, uses dummy weights initialized by "dummy" flag. Tokenizer raises issues, for now copying the mixtral tokenizer

* changed default tokenizer vocab values, loading of custom .pt weight files works.

* removed notebook

* merging master to jurassic-2.5 to reset head

* Merge branch 'master' into jurassic-2.5

* align to master

Approved-by: Tomer Asida
Approved-by: Mor Zusman
Bellk17 added a commit to Bellk17/vllm that referenced this pull request May 10, 2024
ykim362 referenced this pull request in ykim362/vllm Jun 17, 2024
Group Gemm Version
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
robinren03 added a commit to robinren03/vllm that referenced this pull request Sep 10, 2024
ZhijieWang pushed a commit to ZhijieWang/vllm that referenced this pull request Oct 19, 2024
* feat: powv per token

* feat: add justfile

* fix: justfile

* fix: missing link in powv pass

* fix: powv calculation

* ref: powv to separate function

* fix: move to parent class

* feat: initial verify endpoint

* feat: initial verify endpoint

* fix: actually add as route

* feat(WIP): verfiy endpoint

* fix: sequence of ints instead of list for chat completion

* fix: loosen restrictions on verify chat completion

* fix: verifychatcompletion for get_powv

* fix: using wrong field

* fix: add very into rpc layer

* fix: await verify

* fix: non-async fields

* fix: async handling

* fix: no more destruct

* feat: return powv to the top

* fix: send back via socket

* feat: add endpoint for completion

* feat: add version guards
Jeffwan pushed a commit to Jeffwan/vllm that referenced this pull request Oct 21, 2024
* Enable vineyard llm kv cache in vLLM

Based on another version of vllm: sighingnow@d347dab

Cherry-pick from commit d347dab

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
(cherry picked from commit 1545f6bf7edcd667e305d3fbcadd913066f04747)

resolving vllm update diff

temporarily comment out torch.distributed for single node env

add VineyardCacheConfig with https://github.com/v6d-io/v6d/blob/ebe8f077e3d3780a27d49238c501854b6b8e29df/modules/llm-cache/ds/kv_cache_block.cc#L163 commented out; cache_ops fix

remove CacheConfig from argument (configure through ENV)

v6d: fix integration w/ v1 APIs

Signed-off-by: Haiyang Shi <haiyang.shi@bytedance.com>

Change model_runner to latest version

cherry pick model_runner from d347dab source sighingnow@d347dab

fix reshape_and_cache_flash argument

add cache prefetch/update to work_base

clean up

Fix after rebase to 029c71d

remove tensor copy from cache managed address to pin memory

clean up

* Add fixes to address comments

---------

Co-authored-by: Tao He <linzhu.ht@alibaba-inc.com>
Xaenalt pushed a commit to Xaenalt/vllm that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant