The `get_ppl` missed the last token of each iteration during multi-iter prefill #2499

lvhan028 · 2024-09-23T10:17:22Z

fix the missed last token in each prefill iteration
Assume the max memory footprint of logits is 2G, preventing OOM when batch_size * seq_len is too large

lmdeploy/serve/utils.py

lvhan028 · 2024-09-23T13:58:10Z

@grimoire can we remove the decode in class Engine?
The pipeline calls engine_instance's decode and get_ppl directly.

grimoire · 2024-09-24T02:31:09Z

can we remove the decode in class Engine?

Accept

lmdeploy/serve/utils.py

irexyc · 2024-09-26T06:33:13Z

LGTM

the unit-test error is because the latest transformers changed their api and our logic for getting stop words invoked that function.

https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/tokenization_utils_base.py#L3526
https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L3514
https://huggingface.co/THUDM/chatglm-6b/blob/main/tokenization_chatglm.py#L349

lvhan028 added 2 commits September 23, 2024 18:14

fix get_ppl

fce75ba

update

a88c4c9

lvhan028 added the Bug:P1 label Sep 23, 2024

lvhan028 requested review from irexyc and lzhangzz September 23, 2024 10:31

irexyc reviewed Sep 23, 2024

View reviewed changes

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

update

5c11535

lvhan028 requested a review from grimoire September 23, 2024 13:56

lvhan028 added 2 commits September 24, 2024 11:25

remove get_ppl from engine.py

d5c5f39

fix according to reviewer comments

db15e41

irexyc reviewed Sep 24, 2024

View reviewed changes

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

lvhan028 added 6 commits September 24, 2024 17:50

fix

95b7ef2

update

e49b964

keep logits.device unchanged

28b0258

require input_ids have the same length

4e7432d

merge main

bd3df2e

rollback user guide

a1a4845

irexyc reviewed Sep 25, 2024

View reviewed changes

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

lmdeploy/serve/utils.py Outdated Show resolved Hide resolved

update

c64c00f

irexyc approved these changes Sep 25, 2024

View reviewed changes

lvhan028 added 2 commits September 26, 2024 00:13

split batch dim

4066eb5

apply torch.cuda.empty_cache()

c760794

lvhan028 changed the title ~~The last token of each iteration is missed during multi-iter prefill~~ The get_ppl missed the last token of each iteration during multi-iter prefill Sep 26, 2024

lzhangzz approved these changes Sep 26, 2024

View reviewed changes

lvhan028 merged commit 4812b5a into InternLM:main Sep 26, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `get_ppl` missed the last token of each iteration during multi-iter prefill #2499

The `get_ppl` missed the last token of each iteration during multi-iter prefill #2499

lvhan028 commented Sep 23, 2024 •

edited

Loading

lvhan028 commented Sep 23, 2024

grimoire commented Sep 24, 2024

irexyc commented Sep 26, 2024

The get_ppl missed the last token of each iteration during multi-iter prefill #2499

The get_ppl missed the last token of each iteration during multi-iter prefill #2499

Conversation

lvhan028 commented Sep 23, 2024 • edited Loading

lvhan028 commented Sep 23, 2024

grimoire commented Sep 24, 2024

irexyc commented Sep 26, 2024

The `get_ppl` missed the last token of each iteration during multi-iter prefill #2499

The `get_ppl` missed the last token of each iteration during multi-iter prefill #2499

lvhan028 commented Sep 23, 2024 •

edited

Loading