Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add callback for saving trainable parameters and model config #178

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

GirinMan
Copy link

Overview

  • This PR is originated from Saving pytorch_model.bin with QLORA #123
  • I also faced similar problems with it but no one ever made commits for it...
  • I added a callback which saves not only adapter_model.bin but also trainable_params.bin and the configuration of backbone model(config.json) in order to reuse the configurations of rope scaling.

New callback: SavePeftModelCallback

  • In a new file named save_callback.py, I added a callback named SavePeftModelCallback, which saves trained weights and model config in a new directory.
  • The name of the directory is like f"{args.output_dir}/step-{state.global_step}}". The callback will automatically create if it doesn't exist, so that this callback can be used to store separate checkpoints at specific step intervals.

Changes in merge_lora_weights_and_save_hf_model.py

  • While loading a backbone model, the script didn't use the model config used during training, so the merged & saved checkpoint does not have information of the rope scaling configurations.
  • I guess this is why the config of LongLoRA models in huggingface hub do not contain any information with rope_scaling, even though they where changed during training. That's why I let SavePeftModelCallback to save the model's config too.
  • With changes in this PR, merge_lora_weights_and_save_hf_model.py will try to load and use the model config saved during training, which contains information about rope scaling.

See Llama-2-7b-longlora-8k/main/config.json

{
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.31.0.dev0",
  "use_cache": true,
  "vocab_size": 32001
}

Thank you so much for sharing and maintaining such great research!
If you have any feedback, please feel free to...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant