Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about deepspeed checkpoint loading #138

Open
Wintoplay opened this issue Nov 18, 2024 · 1 comment
Open

Question about deepspeed checkpoint loading #138

Wintoplay opened this issue Nov 18, 2024 · 1 comment

Comments

@Wintoplay
Copy link

I tried to load Lora training adapters from Deepspeed checkpoint:
dir:

ls Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
total 696M
-rw-r--r-- 1 schwan46494@gmail.com CU   775 Nov 18 11:03 adapter_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU  686M Nov 18 11:03 adapter_model.safetensors
-rw-rw-r-- 1 schwan46494@gmail.com CU  1.4K Nov 18 16:54 config.json
drwxr-xr-x 2 schwan46494@gmail.com CU  4.0K Nov 18 11:03 global_step6000
-rw-r--r-- 1 schwan46494@gmail.com CU    15 Nov 18 11:03 latest
-rw-r--r-- 1 schwan46494@gmail.com CU  5.1K Nov 18 11:03 README.md
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_0.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_1.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_2.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_3.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_4.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_5.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_6.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_7.pth
-rw-r--r-- 1 schwan46494@gmail.com CU  1.1K Nov 18 11:03 scheduler.pt
-rw-r--r-- 1 schwan46494@gmail.com CU   221 Nov 18 11:03 special_tokens_map.json
-rw-r--r-- 1 schwan46494@gmail.com CU   50K Nov 18 11:03 tokenizer_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU  8.7M Nov 18 11:03 tokenizer.json
-rw-r--r-- 1 schwan46494@gmail.com CU 1023K Nov 18 11:03 trainer_state.json
-rw-r--r-- 1 schwan46494@gmail.com CU  6.5K Nov 18 11:03 training_args.bin
-rwxr--r-- 1 schwan46494@gmail.com CU   25K Nov 18 11:03 zero_to_fp32.py

instead of the usual Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2 as I want to perform error analysis of when do my model corrupt.

with this code:

# model_path is Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
# base_model_path is a bunny variant model.
model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,  # float32 for cpu
    device_map='auto',
    trust_remote_code=True)
model.load_adapter(model_path)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True)

However, I get this warning.

Some weights of the model checkpoint at /home/11001207/chawanP/Teerapol/llama-3-typhoon-v1.5-8b-vision-preview were not used when initializing BunnyLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_bias', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_weight', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.layernorm.bias', 'model.vision_tower.vision_tower.vision_model.head.layernorm.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.head.probe']

  • This IS expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Loading adapter weights from /home/11001207/chawanP/pak/Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000 led to unexpected keys not found in the model: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_B.default.weight'].

Question

  • Is the model not fusing the vision adapters?
  • How to load/convert these checkpoints? (Their schema is different, they have no non_lora_trainable.bin, config.json, and more)
@Isaachhh
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants