[PoC] `accelerate` support for trl #50

younesbelkada · 2022-12-14T13:22:26Z

Hey there!

Let's see how we can integrate accelerate to enable larger models training (up to 10B scale). Here is a draft version on how we could potentially approach that.

Firstly, I didn't wrote tests yet, but for now, I think that having example scripts on examples/ and running them while developing seems to be the right approach. In the near future we'll need to add tests. I also think that we should develop the code outside nbdev

Secondly, I think that we should be able to support more models instead of writing a new file per architecture. The will be addressed too.

To sumarize, this PR addresses the following:

1- `accelerate` integration for naive training using `accelerate`

For now this only supports the native accelerate integration. It includes automatic mixed precision training and GPU device assignment. I need to test multi-GPU setup & Data Parallelism

2- Support for `xxxForCausalLM` architecture

Anyone can start experimenting using any of the xxxForCausalLM architecture, this would include any of the Bloom, OPT, etc.
The API looks like the following for now and can be adapted:

from transformers import AutoModelForCausalLM
from trl.models import AutoRegressiveLMWithValueHead

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")
ref_model = AutoRegressiveLMWithValueHead(model)

3- Support latest version of `transformers`

this PR adds also slight modifications to support latest transformers version.

4- Community orientation

I believe this repo should be a center point for anyone on the community that wants to experiment on hfrl. I think that we should educate users and contributors on how to implement a new training method. For now to add a new training method a user should:

Implement its own xxxTrainer class that inherits from BaseTrainer class
add a clear example on examples/
That's it!

5- How am I testing the implementation?

For now I am running accelerate launch example/accelerate-ppo.py and observing the training dynamics of the model and assert that it has more or less the same behavior than the reference one: https://wandb.ai/lvwerra/trl-showcase/runs/1jtvxb1m?workspace=
Usually after 6-7 steps the reward starts to skyrocket.

What is next?

discuss how we can improve the API
remove dependency for nbdev
broader support for more architectures
test multi-GPU setup (DP)
Implement tests

cc @lvwerra

younesbelkada · 2022-12-14T13:37:29Z

For now I am testing my implementation with accelerate launch example/ppo-accelerate.py

- add `make style` support - add `accelerate` example that runs fine - refactor `gpt2` to support latest version of `transformers` - add `base` model to support `xxxForCausalLM` support - refactor `accelerate` trainer - add `utils` file

younesbelkada · 2022-12-15T10:49:55Z

Regarding tests, this is tricky but from what I can see we can for now:

Test if all trainers respects the inheritance from BaseTrainer (by checking if all the needed functions are implemented)
Test if all models work as expected (thinking of generate method) and if we can in fact support all xxxForCausalLM architectures as claimed above. From what I can see, as long as the model has a proper generate method the PPOTrainer should work

lvwerra

Awesome work @younesbelkada! I left a few comments but I think figuring out all the details for the proposed changes will take some time so what do you think about breaking it down a bit:

remove nbdev and setup a CI
PR for the the AutoModelForCasualLMWithValueHead (we can easily test if this works as expected)
I think we only want the Accelerate trainer (which should be equivalent to what's there now if you just run accelerate on a single machine).
The two places where we would benefit the most from data parallelism is generation and step - let's think about how we can achieve in the most elegant way.

I also need to read up a bit on current strategies. Maybe there are otherways how to optimize the training loops (e.g. do ref and trained model to be different or could it be the same model with two heads).

lvwerra · 2022-12-16T15:09:31Z

TODO.md

+How do I imagine the API?
+```
+from transformers import BloomForCausalLM, BloomTokenizer
+from trl import AutoRegressiveLMWithValueHead


What do you think about sticking to the transformers naming?

Suggested change

from trl import AutoRegressiveLMWithValueHead

from trl import AutoModelForCausalLMWithValueHead

lvwerra · 2022-12-16T15:14:06Z

examples/accelerate-ppo.py

+
+    #### Get response from gpt2
+    t = time.time()
+    response_tensors = ppo_trainer.get_response(query_tensors, **gen_kwargs)


What do you think about using familiar naming here:

Suggested change

response_tensors = ppo_trainer.get_response(query_tensors, **gen_kwargs)

response_tensors = ppo_trainer.generate(query_tensors, **gen_kwargs)

lvwerra · 2022-12-16T15:19:31Z

examples/accelerate-ppo.py

+    logs.update({'game_log': wandb.Table(columns=['query', 'response', 'reward'], rows=table_rows)})
+    logs.update(timing)
+    logs.update(stats)
+    logs['env/reward_mean'] = torch.mean(rewards).cpu().numpy()
+    logs['env/reward_std'] = torch.std(rewards).cpu().numpy()
+    logs['env/reward_dist'] = rewards.cpu().numpy()


We should find a better way of logging things, maybe inside the trainer similar to how Trainer does it.

lvwerra · 2022-12-16T15:23:06Z

trl/trainer/accelerate_ppo.py

+
+        self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader = self.accelerator.prepare(self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader)
+
+    def _build_dataset(self, dataset_name="imdb"):


I would keep this outside the Trainer. that's the responsibility of the user to do and inside we have a very specific toy example.

lvwerra · 2022-12-16T15:25:15Z

trl/trainer/accelerate_ppo.py

+        # HACK: do we really need this?
+        self.tokenizer.pad_token = self.tokenizer.eos_token
+
+        self.model = AutoRegressiveLMWithValueHead(base_model)


Maybe we can write this class in a way that we can call just:

self.model = AutoModelForCausalLMWithValueHead.from_pretrained(model_name)

instead of having the two way process?

lvwerra · 2022-12-16T15:34:15Z

trl/trainer/accelerate_ppo.py

+        wandb.watch(self.model, log='all')
+
+
+    def step(self, queries, responses, scores):


The step call is by far the slowest part of the whole training process:

This means that we could profit a lot from parallelism (if it works). However, that would mean that we wrap the data into a dataloader and do minibatches in parallel. We would need to experiment and check in literature how much we can actually do that with PPO.

lvwerra · 2022-12-16T16:08:03Z

trl/trainer/accelerate_ppo.py

+        self.config.update(config)
+
+        # Step 2: Initialize model, tokenizer and dataset
+        self._build_models_and_tokenizer()


i think we can follow a similar logic to what we do in the Trainer/Evaluator classes: the models can be either provided as torch.models or as strings in which case we load them.

lvwerra · 2022-12-16T16:11:14Z

trl/trainer/accelerate_ppo.py

+
+        timing['time/ppo/total'] = time.time()-t0
+        stats.update(timing)
+        return stats


since you initialize the W&B logger with accelerate we can probably just pass all the stats to it internally and don't put the burden on the user.

Also, i think we can make the type of logger a kwarg that's passed to PPO so we don't require users to use W&B and they could also setup other loggers.

younesbelkada · 2022-12-27T12:50:55Z

Closing in favor of #58

younesbelkada added 4 commits December 13, 2022 14:58

v1

49378dc

draft v1

c814f33

refactor naming

600fb42

delete unneeded file

a5e367f

refactor

582471c

- add `make style` support - add `accelerate` example that runs fine - refactor `gpt2` to support latest version of `transformers` - add `base` model to support `xxxForCausalLM` support - refactor `accelerate` trainer - add `utils` file

lvwerra reviewed Dec 16, 2022

View reviewed changes

younesbelkada mentioned this pull request Dec 16, 2022

Remove nbdev dependency #52

Merged

younesbelkada closed this Dec 27, 2022

August-murr mentioned this pull request Jan 6, 2025

onlinedpo error when use deepspeed zero3 August-murr/trl#7

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] `accelerate` support for trl #50

[PoC] `accelerate` support for trl #50

younesbelkada commented Dec 14, 2022 •

edited

Loading

younesbelkada commented Dec 14, 2022

younesbelkada commented Dec 15, 2022

lvwerra left a comment

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

lvwerra Dec 16, 2022

younesbelkada commented Dec 27, 2022

	from trl import AutoRegressiveLMWithValueHead
	from trl import AutoModelForCausalLMWithValueHead

	response_tensors = ppo_trainer.get_response(query_tensors, **gen_kwargs)
	response_tensors = ppo_trainer.generate(query_tensors, **gen_kwargs)


		self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader = self.accelerator.prepare(self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader)

		def _build_dataset(self, dataset_name="imdb"):

		wandb.watch(self.model, log='all')


		def step(self, queries, responses, scores):

[PoC] accelerate support for trl #50

[PoC] accelerate support for trl #50

Conversation

younesbelkada commented Dec 14, 2022 • edited Loading

1- accelerate integration for naive training using accelerate

2- Support for xxxForCausalLM architecture

3- Support latest version of transformers

4- Community orientation

5- How am I testing the implementation?

What is next?

younesbelkada commented Dec 14, 2022

younesbelkada commented Dec 15, 2022

lvwerra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Dec 27, 2022

[PoC] `accelerate` support for trl #50

[PoC] `accelerate` support for trl #50

younesbelkada commented Dec 14, 2022 •

edited

Loading

1- `accelerate` integration for naive training using `accelerate`

2- Support for `xxxForCausalLM` architecture

3- Support latest version of `transformers`