You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
In my usecase I need to call model.resize_token_embeddings(len(tokenizer))after calling accelerate.prepare.
Why? Because I want to be able to load an accelerator state with accelerator.load_state and for that I need to call accelerator.prepare beforehand (based on this issue - #285)
I am pretty sure that by using resize_token_embeddings I am ruining the deepspeed initialiazation that happened inside accelerator.prepare for the embedding layer.
If I remove model.resize_token_embeddings or put it before calling accelerator.prepare the script works.
Here is a minimal reproducible example:
train.py
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
from accelerate import Accelerator
# Initialize a small model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)
# Initialize Accelerator with DeepSpeed and ZeRO-3 optimization
accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(model, torch.optim.Adam(model.parameters()), [])
# Add tokens to the tokenizer
new_tokens = ['[NEW_TOKEN1]', '[NEW_TOKEN2]']
num_added_toks = tokenizer.add_tokens(new_tokens)
# Resize token embeddings - Expected to cause the issue
model.resize_token_embeddings(len(tokenizer))
# Dummy forward pass to test if everything works
inputs = tokenizer("Hello, this is a test", return_tensors="pt").to(accelerator.device)
outputs = model(**inputs)
print(outputs)
Run this:
accelerate launch --config_file deepspeed_config.yaml train.py
This is the error I get (although in my full example the error is a bit different, something with the parameters of the model not having a ds_params field):
Traceback (most recent call last):
File "minimal_deepspeed_tokenizer_problem.py", line 23, in <module>
outputs = model(**inputs)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1768, in forward
loss = self.module(*inputs, **kwargs)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 789, in forward
distilbert_output = self.distilbert(
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 607, in forward
embeddings = self.embeddings(input_ids, inputs_embeds) # (bs, seq_length, dim)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 120, in forward
input_embeds = self.word_embeddings(input_ids) # (bs, max_seq_length, dim)
File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
Expected behavior
I am expecting print(outputs) to work and for the script not to crash.
The text was updated successfully, but these errors were encountered:
Hello, either when training or resuming, please resize the embedding layer before preparing it. That way it is independent of load_state. Please let us know if that resolves the issue
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
In my usecase I need to call
model.resize_token_embeddings(len(tokenizer))
after callingaccelerate.prepare
.Why? Because I want to be able to load an accelerator state with
accelerator.load_state
and for that I need to callaccelerator.prepare
beforehand (based on this issue - #285)I am pretty sure that by using
resize_token_embeddings
I am ruining the deepspeed initialiazation that happened insideaccelerator.prepare
for the embedding layer.If I remove
model.resize_token_embeddings
or put it before callingaccelerator.prepare
the script works.Here is a minimal reproducible example:
train.py
deepspeed_config.yaml
zero3_minimal_example.json
Run this:
accelerate launch --config_file deepspeed_config.yaml train.py
This is the error I get (although in my full example the error is a bit different, something with the parameters of the model not having a ds_params field):
Expected behavior
I am expecting
print(outputs)
to work and for the script not to crash.The text was updated successfully, but these errors were encountered: