Bug with saving LoRA (adapter_model.bin) on latest peft from git #317

mcmonkey4eva · 2023-04-15T22:38:29Z

Setup

using get_peft_model with CAUSAL_LM, transformers.Trainer(...), then lora_model.save_pretrained(lora_file_path) to train a LoRA on LLaMA (int8).

The Issue

When saving at the end, adapter_model.bin is an empty pickle (443 bytes, contains a 6 byte data entry).

This is, of course, wrong, there should be data in there. Prior versions of peft saved full proper files with actual content in them.

Checkpoints saved midway through by save_steps in the transformers trainer seem to contain full valid data (but in a different format).

See oobabooga/text-generation-webui#1098 (comment) for more discussion of the issue externally.

Relevant technical details

pip install git+https://github.com/huggingface/peft has an error (at time of writing that's commit b21559e ) but pip install peft==0.2.0 does not, indicating likely an error sourcing from a recent change.

Relevant source code replicating this issue: https://github.com/mcmonkey4eva/text-generation-webui/blob/lora-trainer-improvements-3/modules/training.py

Side Note

Likely a separate topic, but users have said that on peft==0.2.0 they're seeing huge VRAM spikes when save_pretrained is ran. I haven't seen this myself yet but it may indicate need more broadly for more thorough validation of the saving code.
EDIT: Yeah okay indeed separate topic, other user on github reported the VRAM spike part is actually caused by bitsandbytes rather than peft.

(Also it's a bit strange that pickles are being used - a separate project entirely but those should definitely be replaced to safetensors files)

The text was updated successfully, but these errors were encountered:

mcmonkey4eva · 2023-04-18T03:48:35Z

Update: so, after some testing, torch.save(trainer.model.state_dict(), f"{lora_file_path}/adapter_model.bin") is able to save a valid file.
But, lora_model.save_pretrained(lora_file_path, state_dict=trainer.model.state_dict()) does not.

That both, (A) hopefully helps narrow down where the issue may lie, and (B) gives a functional workaround for until it's properly fixed.

pacman100 · 2023-04-18T06:16:27Z

Hello @mcmonkey4eva, I am using the latest main branch and running the following example: https://github.com/huggingface/peft/blob/main/examples/int8_training/Finetune_opt_bnb_peft.ipynb

I am unable to reproduce the above issue:

pacman100 · 2023-04-18T06:17:53Z

Could you please share minimal code that we can run to reproduce the above issue?

pacman100 · 2023-04-18T06:23:41Z

As per this tloen/alpaca-lora#293, uninstalling and reinstalling fixed this it seems

pacman100 · 2023-04-18T07:42:49Z

Please also see this #286

pacman100 · 2023-04-18T08:21:06Z

Above PR should have the fixes. Note that there were no issues with PEFT and they were related to alpaca-lora

mcmonkey4eva · 2023-04-18T08:24:30Z

That's perfect, that fixed it. Thank you so much for taking the time to investigate and get it fixed for everyone!

mcmonkey4eva mentioned this issue Apr 15, 2023

Lora trainer improvements part 3 oobabooga/text-generation-webui#1098

Merged

oobabooga mentioned this issue Apr 16, 2023

Lora saving is broken with the new PEFT version oobabooga/text-generation-webui#1253

Closed

1 task

mcmonkey4eva mentioned this issue Apr 18, 2023

Lora Trainer Improvements Part 4: 4-Bit Support and more! oobabooga/text-generation-webui#1334

Merged

pacman100 mentioned this issue Apr 18, 2023

fix issues to be compatible with latest peft tloen/alpaca-lora#359

Closed

mcmonkey4eva closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with saving LoRA (adapter_model.bin) on latest peft from git #317

Bug with saving LoRA (adapter_model.bin) on latest peft from git #317

mcmonkey4eva commented Apr 15, 2023 •

edited

Loading

mcmonkey4eva commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

mcmonkey4eva commented Apr 18, 2023

Bug with saving LoRA (adapter_model.bin) on latest peft from git #317

Bug with saving LoRA (adapter_model.bin) on latest peft from git #317

Comments

mcmonkey4eva commented Apr 15, 2023 • edited Loading

Setup

The Issue

Relevant technical details

Side Note

mcmonkey4eva commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

pacman100 commented Apr 18, 2023

mcmonkey4eva commented Apr 18, 2023

mcmonkey4eva commented Apr 15, 2023 •

edited

Loading