-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEFT doesn't inject virtual tokens into generate forward pass #2134
Comments
Could you please clarify where you check Also, note that #2096 is in the works that should hopefully fix some issues that prefix tuning has with the latest transformers version. If possible, you could check if that branch fixes the error. |
#2096 never fixes anything except suppressing the warnings since here will convert legacy Then, I started to debug step by step until I found here. The |
I'm not sure if I follow. I set a debugger at the line you mentioned. |
I had a similar problem. I am currently training the To confirm this, I attempted to change all the training data to However, when I subsequently loaded the model from disk using |
When I installed the peft package from github repo (0.13.3.dev0), I can use However, the following code still failed def inference(path_model, messages):
tokenizer = AutoTokenizer.from_pretrained(path_model, use_fast=False)
model = AutoPeftModelForCausalLM.from_pretrained(path_model, device_map="auto")
model.eval()
prompt_tokenized = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
output_tokenized = model.generate(input_ids=prompt_tokenized.to("cuda"), do_sample=False, max_length=10000)[0]
output_tokenized = output_tokenized[prompt_tokenized.size(1):]
output = tokenizer.decode(output_tokenized, skip_special_tokens=True)
return output I met the following error:
Only if I change my code into |
See huggingface#2134 After introducing the usage of DynamicCache for prefix tuning, a bug could now occur if the model is dispatched to different devices. This is because we need to move the key and value cache for each layer to that layer's respective device. The new code mostly consists of code copied from transformers to be consistent with how transformers solves this.
See #2134 After introducing the usage of DynamicCache for prefix tuning, a bug could now occur if the model is dispatched to different devices. This is because we need to move the key and value cache for each layer to that layer's respective device. The new code mostly consists of code copied from transformers to be consistent with how transformers solves this.
@BenjaminBossan Thanks, I have pulled the latest commit 7295b33, and it fixes the issue in inference. |
System Info
transformers
version: 4.46.0.dev0Who can help?
@BenjaminBossan @sayakpaul
Information
Tasks
examples
folderReproduction
I met the problem by the following code:
When I add
print(past_key_values)
in transformers side, I gotDynamicCache()
, which means the virtual tokens weren't injected to forward pass.Expected behavior
It should get a cache with length of
num_virtual_tokens
.The text was updated successfully, but these errors were encountered: