-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] accelerate
support for trl
#50
Conversation
For now I am testing my implementation with |
Regarding tests, this is tricky but from what I can see we can for now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @younesbelkada! I left a few comments but I think figuring out all the details for the proposed changes will take some time so what do you think about breaking it down a bit:
- remove
nbdev
and setup a CI - PR for the the
AutoModelForCasualLMWithValueHead
(we can easily test if this works as expected) - I think we only want the Accelerate trainer (which should be equivalent to what's there now if you just run accelerate on a single machine).
- The two places where we would benefit the most from data parallelism is
generation
andstep
- let's think about how we can achieve in the most elegant way.
I also need to read up a bit on current strategies. Maybe there are otherways how to optimize the training loops (e.g. do ref and trained model to be different or could it be the same model with two heads).
How do I imagine the API? | ||
``` | ||
from transformers import BloomForCausalLM, BloomTokenizer | ||
from trl import AutoRegressiveLMWithValueHead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about sticking to the transformers
naming?
from trl import AutoRegressiveLMWithValueHead | |
from trl import AutoModelForCausalLMWithValueHead |
|
||
#### Get response from gpt2 | ||
t = time.time() | ||
response_tensors = ppo_trainer.get_response(query_tensors, **gen_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using familiar naming here:
response_tensors = ppo_trainer.get_response(query_tensors, **gen_kwargs) | |
response_tensors = ppo_trainer.generate(query_tensors, **gen_kwargs) |
logs.update({'game_log': wandb.Table(columns=['query', 'response', 'reward'], rows=table_rows)}) | ||
logs.update(timing) | ||
logs.update(stats) | ||
logs['env/reward_mean'] = torch.mean(rewards).cpu().numpy() | ||
logs['env/reward_std'] = torch.std(rewards).cpu().numpy() | ||
logs['env/reward_dist'] = rewards.cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should find a better way of logging things, maybe inside the trainer similar to how Trainer
does it.
|
||
self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader = self.accelerator.prepare(self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader) | ||
|
||
def _build_dataset(self, dataset_name="imdb"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this outside the Trainer. that's the responsibility of the user to do and inside we have a very specific toy example.
# HACK: do we really need this? | ||
self.tokenizer.pad_token = self.tokenizer.eos_token | ||
|
||
self.model = AutoRegressiveLMWithValueHead(base_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can write this class in a way that we can call just:
self.model = AutoModelForCausalLMWithValueHead.from_pretrained(model_name)
instead of having the two way process?
wandb.watch(self.model, log='all') | ||
|
||
|
||
def step(self, queries, responses, scores): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step
call is by far the slowest part of the whole training process:
This means that we could profit a lot from parallelism (if it works). However, that would mean that we wrap the data into a dataloader and do minibatches in parallel. We would need to experiment and check in literature how much we can actually do that with PPO.
self.config.update(config) | ||
|
||
# Step 2: Initialize model, tokenizer and dataset | ||
self._build_models_and_tokenizer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can follow a similar logic to what we do in the Trainer/Evaluator classes: the models can be either provided as torch.models or as strings in which case we load them.
|
||
timing['time/ppo/total'] = time.time()-t0 | ||
stats.update(timing) | ||
return stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you initialize the W&B logger with accelerate we can probably just pass all the stats to it internally and don't put the burden on the user.
Also, i think we can make the type of logger a kwarg that's passed to PPO so we don't require users to use W&B and they could also setup other loggers.
Closing in favor of #58 |
Hey there!
Let's see how we can integrate
accelerate
to enable larger models training (up to 10B scale). Here is a draft version on how we could potentially approach that.Firstly, I didn't wrote tests yet, but for now, I think that having example scripts on
examples/
and running them while developing seems to be the right approach. In the near future we'll need to add tests. I also think that we should develop the code outsidenbdev
Secondly, I think that we should be able to support more models instead of writing a new file per architecture. The will be addressed too.
To sumarize, this PR addresses the following:
1-
accelerate
integration for naive training usingaccelerate
For now this only supports the native
accelerate
integration. It includes automatic mixed precision training and GPU device assignment. I need to test multi-GPU setup & Data Parallelism2- Support for
xxxForCausalLM
architectureAnyone can start experimenting using any of the
xxxForCausalLM
architecture, this would include any of theBloom
,OPT
, etc.The API looks like the following for now and can be adapted:
3- Support latest version of
transformers
this PR adds also slight modifications to support latest
transformers
version.4- Community orientation
I believe this repo should be a center point for anyone on the community that wants to experiment on hfrl. I think that we should educate users and contributors on how to implement a new training method. For now to add a new training method a user should:
xxxTrainer
class that inherits fromBaseTrainer
classexamples/
That's it!
5- How am I testing the implementation?
For now I am running
accelerate launch example/accelerate-ppo.py
and observing the training dynamics of the model and assert that it has more or less the same behavior than the reference one: https://wandb.ai/lvwerra/trl-showcase/runs/1jtvxb1m?workspace=Usually after 6-7 steps the reward starts to skyrocket.
What is next?
nbdev
cc @lvwerra