Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

Open
1 task done
bw-wang19 opened this issue Jan 18, 2025 · 1 comment
Open
1 task done

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

bw-wang19 opened this issue Jan 18, 2025 · 1 comment
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@bw-wang19
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.31
  • Python version: 3.11.0
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.45.0
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: Tesla V100-SXM2-32GB

Reproduction

My bash entry:

fsdp_config='./fsdp_config.yaml'
train_file='./LLaMA-Factory/src/train.py'
train_config='./llama3_full_sft.yaml'
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
    --config_file $fsdp_config \
    $train_file $train_config

in which,
fsdp_config.yaml is:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_forward_prefetch: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_offload_params: true # offload may affect training speed
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: fp16 # or bf16
num_machines: 1 # the number of nodes
num_processes: 8 # the number of GPUs in all nodes
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

llama3_full_sft.yaml is:

bf16: false
cutoff_len: 2048
dataset: math_20k
dataset_dir: ./configs/finetuning
ddp_timeout: 180000000
do_train: true
eval_steps: 50
eval_strategy: steps
finetuning_type: full
gradient_accumulation_steps: 8
learning_rate: 3.0e-05
logging_steps: 5
lr_scheduler_type: cosine
max_samples: 20000
model_name_or_path:./model_zoo/llama/llama-3-8b-Instruct
num_train_epochs: 2.0
output_dir: ./sftmodels/llama3-8b-instruct-math_20k-full-fsdp
overwrite_cache: true
overwrite_output_dir: true
per_device_eval_batch_size: 32
per_device_train_batch_size: 1
plot_loss: true
preprocessing_num_workers: 32
report_to: wandb
run_name: llama3-8b-instruct-math_20k-full-fsdp
save_steps: 200
stage: sft
template: llama3
val_size: 0.01
warmup_ratio: 0.01
fp16: true
optim: adamw_hf
fp16_full_eval: ture

After fine-tuned without raising any error, I try to load the fine-tuned model and inference with transformer lib:

from transformers import (LlamaForCausalLM, 
                          LlamaTokenizer,
                          AutoTokenizer, 
                          AutoModelForCausalLM)
model_path = '/workspace/acl/sftmodels/llama3-8b-instruct-full-fsdp'
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=False, device_map = "auto")
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

Here the error raises:

Loading checkpoint shards: 100%|██████████| 6/6 [00:00<00:00, 10.36it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], [line 3](vscode-notebook-cell:?execution_count=5&line=3)
      [1](vscode-notebook-cell:?execution_count=5&line=1) model_path = '/workspace/acl/sftmodels/llama3-8b-instruct-medicine_20k-full-fsdp'
      [2](vscode-notebook-cell:?execution_count=5&line=2) tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=False, device_map = "auto")
----> [3](vscode-notebook-cell:?execution_count=5&line=3) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    [562](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:562) elif type(config) in cls._model_mapping.keys():
    [563](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:563)     model_class = _get_model_class(config, cls._model_mapping)
--> [564](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564)     return model_class.from_pretrained(
    [565](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:565)         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    [566](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:566)     )
    [567](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:567) raise ValueError(
    [568](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:568)     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    [569](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:569)     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    [570](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:570) )

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   [3998](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3998)     if dtype_orig is not None:
   [3999](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3999)         torch.set_default_dtype(dtype_orig)
   [4001](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4001)     (
   [4002](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4002)         model,
   [4003](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4003)         missing_keys,
   [4004](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4004)         unexpected_keys,
   [4005](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4005)         mismatched_keys,
   [4006](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4006)         offload_index,
   [4007](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4007)         error_msgs,
-> [4008](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008)     ) = cls._load_pretrained_model(
   [4009](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4009)         model,
   [4010](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4010)         state_dict,
   [4011](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4011)         loaded_state_dict_keys,  # XXX: rename?
   [4012](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4012)         resolved_archive_file,
   [4013](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4013)         pretrained_model_name_or_path,
   [4014](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4014)         ignore_mismatched_sizes=ignore_mismatched_sizes,
   [4015](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4015)         sharded_metadata=sharded_metadata,
   [4016](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4016)         _fast_init=_fast_init,
   [4017](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4017)         low_cpu_mem_usage=low_cpu_mem_usage,
   [4018](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4018)         device_map=device_map,
   [4019](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4019)         offload_folder=offload_folder,
   [4020](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4020)         offload_state_dict=offload_state_dict,
   [4021](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4021)         dtype=torch_dtype,
   [4022](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4022)         hf_quantizer=hf_quantizer,
   [4023](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4023)         keep_in_fp32_modules=keep_in_fp32_modules,
   [4024](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4024)         gguf_path=gguf_path,
   [4025](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4025)     )
   [4027](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4027) # make sure token embedding weights are still tied if needed
   [4028](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4028) model.tie_weights()

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553, in PreTrainedModel._load_pretrained_model(***failed resolving arguments***)
   [4549](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4549)     if "size mismatch" in error_msg:
   [4550](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4550)         error_msg += (
   [4551](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4551)             "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
   [4552](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4552)         )
-> [4553](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553)     raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
   [4555](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4555) if len(unexpected_keys) > 0:
   [4556](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4556)     archs = [] if model.config.architectures is None else model.config.architectures

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
	size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([131334656]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
	size mismatch for model.norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for lm_head.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Even if I use the argumentignore_mismatched_sizes=True, the same error will raise while inference on the model.
How should I solve it?

Others

No response

@bw-wang19 bw-wang19 added bug Something isn't working pending This problem is yet to be addressed labels Jan 18, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 20, 2025

You can try using DeepSpeed zero3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants