Batched generation #228

lvwerra · 2023-03-17T17:43:57Z

This enables to generate in batches rather than one by one with a custom batch size. This simplifies the generation and makes generation significantly faster! Changes are backwards compatible.

Code before:

response_tensors = []
for question in tqdm(question_tensors):
    gen_len = output_length_sampler()
    generation_kwargs["max_new_tokens"] = gen_len
    response = ppo_trainer.generate(question, **generation_kwargs)
    response_tensors.append(response.squeeze()[-gen_len:])
batch["response"] = [tokenizer.decode(r.squeeze()) for r in response_tensors]

Code after:

response_tensors = ppo_trainer.generate(question_tensors,  return_prompt=False,
    length_sampler= output_length_sampler,  **generation_kwargs)
batch["response"] = tokenizer.batch_decode(response_tensors)

Todo: Add some tests.

HuggingFaceDocBuilderDev · 2023-03-17T17:53:10Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra · 2023-03-21T14:56:47Z

Added some test and ran the following benchmark (cc @edbeeching @natolambert) on the generation part:

Model	New tokens	Batch size	Samples	Time
GPT-2	64	1	256	176.08
GPT-2	64	2	256	91.54
GPT-2	64	4	256	37.29
GPT-2	64	8	256	23.93
GPT-2	64	16	256	11.58

GPT-2 XL	64	1	256	568.64
GPT-2 XL	64	2	256	379.76
GPT-2 XL	64	4	256	234.63
GPT-2 XL	64	8	256	176.02

younesbelkada

Awesome speedup! Looks great in general 🚀
I left some comments, I think we need a safety checker to check if batch_size is effectively smaller than len(queries)

trl/trainer/ppo_trainer.py

younesbelkada · 2023-03-21T15:07:25Z

trl/trainer/ppo_trainer.py

+                if not return_prompt:
+                    output = output[(mask).sum() :]  # remove prompt


Why this is needed? i.e. in which case do we need to return the prompt?

we already have the query and it all examples we spend 1-2 lines removing the queries from the generations. with that it's done automatically :)

AHh good point ok, you mean here right?
In this case for the API consistency I think that we need to add it as well in the block that does not call batched generate as well.
Also note that for seq2seq models, the model already returns the response without the query: https://github.com/lvwerra/trl/blob/0610711ddab3ba1d8b5d41d31423c213b433472e/examples/sentiment/scripts/t5-sentiment.py#L153 so might be worth it to add another safety checker if self.is_encoder_decoder -> ignore return_prompt

In this case for the API consistency I think that we need to add it as well in the block that does not call batched generate as well.

We do here, no?: https://github.com/lvwerra/trl/blob/ee04bada9d4607c8273b41119d9adde98c7c9528/trl/trainer/ppo_trainer.py#L398

Good point about enc-dec, will add an extra clause.

Ah yes thanks, indeed we do it

trl/trainer/ppo_trainer.py

lvwerra · 2023-03-21T15:32:23Z

Adressed the comments @younesbelkada !

younesbelkada

Awesome work! Thanks a lot for this!

add _generate_batch

f98fc47

lvwerra requested review from edbeeching and younesbelkada March 17, 2023 17:43

fix style

0bf85cf

leandro von werra and others added 3 commits March 21, 2023 13:29

omit tensor conversion

1f5110c

no multiple pad by default

65401ad

add test

57dc57c

stylez

ee04bad

younesbelkada reviewed Mar 21, 2023

View reviewed changes

leandro added 3 commits March 21, 2023 16:31

update docstring

7687dae

encoder/decoder check

5f523aa

input shape safety

bd29057

moar style

e629e18

younesbelkada approved these changes Mar 21, 2023

View reviewed changes

lvwerra merged commit 9c3e9e4 into main Mar 21, 2023

lvwerra deleted the batched-generation branch March 21, 2023 15:48

lvwerra mentioned this pull request Mar 24, 2023

why not use bacth for model generate #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched generation #228

Batched generation #228

lvwerra commented Mar 17, 2023

HuggingFaceDocBuilderDev commented Mar 17, 2023 •

edited

Loading

lvwerra commented Mar 21, 2023 •

edited

Loading

younesbelkada left a comment

younesbelkada Mar 21, 2023

lvwerra Mar 21, 2023

younesbelkada Mar 21, 2023 •

edited

Loading

lvwerra Mar 21, 2023

younesbelkada Mar 21, 2023

lvwerra commented Mar 21, 2023

younesbelkada left a comment

		if not return_prompt:
		output = output[(mask).sum() :] # remove prompt

Batched generation #228

Batched generation #228

Conversation

lvwerra commented Mar 17, 2023

HuggingFaceDocBuilderDev commented Mar 17, 2023 • edited Loading

lvwerra commented Mar 21, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Mar 21, 2023

Choose a reason for hiding this comment

lvwerra Mar 21, 2023

Choose a reason for hiding this comment

younesbelkada Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

lvwerra Mar 21, 2023

Choose a reason for hiding this comment

younesbelkada Mar 21, 2023

Choose a reason for hiding this comment

lvwerra commented Mar 21, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 17, 2023 •

edited

Loading

lvwerra commented Mar 21, 2023 •

edited

Loading

younesbelkada Mar 21, 2023 •

edited

Loading