Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

bw-wang19 · 2025-01-18T14:38:57Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.31
Python version: 3.11.0
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.45.0
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: Tesla V100-SXM2-32GB

Reproduction

My bash entry:

fsdp_config='./fsdp_config.yaml'
train_file='./LLaMA-Factory/src/train.py'
train_config='./llama3_full_sft.yaml'
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
    --config_file $fsdp_config \
    $train_file $train_config

in which,
fsdp_config.yaml is:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_forward_prefetch: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_offload_params: true # offload may affect training speed
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: fp16 # or bf16
num_machines: 1 # the number of nodes
num_processes: 8 # the number of GPUs in all nodes
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

llama3_full_sft.yaml is:

bf16: false
cutoff_len: 2048
dataset: math_20k
dataset_dir: ./configs/finetuning
ddp_timeout: 180000000
do_train: true
eval_steps: 50
eval_strategy: steps
finetuning_type: full
gradient_accumulation_steps: 8
learning_rate: 3.0e-05
logging_steps: 5
lr_scheduler_type: cosine
max_samples: 20000
model_name_or_path:./model_zoo/llama/llama-3-8b-Instruct
num_train_epochs: 2.0
output_dir: ./sftmodels/llama3-8b-instruct-math_20k-full-fsdp
overwrite_cache: true
overwrite_output_dir: true
per_device_eval_batch_size: 32
per_device_train_batch_size: 1
plot_loss: true
preprocessing_num_workers: 32
report_to: wandb
run_name: llama3-8b-instruct-math_20k-full-fsdp
save_steps: 200
stage: sft
template: llama3
val_size: 0.01
warmup_ratio: 0.01
fp16: true
optim: adamw_hf
fp16_full_eval: ture

After fine-tuned without raising any error, I try to load the fine-tuned model and inference with transformer lib:

from transformers import (LlamaForCausalLM, 
                          LlamaTokenizer,
                          AutoTokenizer, 
                          AutoModelForCausalLM)
model_path = '/workspace/acl/sftmodels/llama3-8b-instruct-full-fsdp'
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=False, device_map = "auto")
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

Here the error raises:

Loading checkpoint shards: 100%|██████████| 6/6 [00:00<00:00, 10.36it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], [line 3](vscode-notebook-cell:?execution_count=5&line=3)
      [1](vscode-notebook-cell:?execution_count=5&line=1) model_path = '/workspace/acl/sftmodels/llama3-8b-instruct-medicine_20k-full-fsdp'
      [2](vscode-notebook-cell:?execution_count=5&line=2) tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=False, device_map = "auto")
----> [3](vscode-notebook-cell:?execution_count=5&line=3) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    [562](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:562) elif type(config) in cls._model_mapping.keys():
    [563](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:563)     model_class = _get_model_class(config, cls._model_mapping)
--> [564](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564)     return model_class.from_pretrained(
    [565](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:565)         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    [566](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:566)     )
    [567](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:567) raise ValueError(
    [568](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:568)     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    [569](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:569)     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    [570](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:570) )

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   [3998](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3998)     if dtype_orig is not None:
   [3999](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3999)         torch.set_default_dtype(dtype_orig)
   [4001](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4001)     (
   [4002](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4002)         model,
   [4003](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4003)         missing_keys,
   [4004](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4004)         unexpected_keys,
   [4005](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4005)         mismatched_keys,
   [4006](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4006)         offload_index,
   [4007](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4007)         error_msgs,
-> [4008](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008)     ) = cls._load_pretrained_model(
   [4009](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4009)         model,
   [4010](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4010)         state_dict,
   [4011](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4011)         loaded_state_dict_keys,  # XXX: rename?
   [4012](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4012)         resolved_archive_file,
   [4013](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4013)         pretrained_model_name_or_path,
   [4014](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4014)         ignore_mismatched_sizes=ignore_mismatched_sizes,
   [4015](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4015)         sharded_metadata=sharded_metadata,
   [4016](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4016)         _fast_init=_fast_init,
   [4017](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4017)         low_cpu_mem_usage=low_cpu_mem_usage,
   [4018](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4018)         device_map=device_map,
   [4019](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4019)         offload_folder=offload_folder,
   [4020](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4020)         offload_state_dict=offload_state_dict,
   [4021](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4021)         dtype=torch_dtype,
   [4022](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4022)         hf_quantizer=hf_quantizer,
   [4023](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4023)         keep_in_fp32_modules=keep_in_fp32_modules,
   [4024](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4024)         gguf_path=gguf_path,
   [4025](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4025)     )
   [4027](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4027) # make sure token embedding weights are still tied if needed
   [4028](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4028) model.tie_weights()

File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553, in PreTrainedModel._load_pretrained_model(***failed resolving arguments***)
   [4549](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4549)     if "size mismatch" in error_msg:
   [4550](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4550)         error_msg += (
   [4551](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4551)             "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
   [4552](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4552)         )
-> [4553](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553)     raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
   [4555](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4555) if len(unexpected_keys) > 0:
   [4556](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4556)     archs = [] if model.config.architectures is None else model.config.architectures

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
	size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([131334656]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
	size mismatch for model.norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for lm_head.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Even if I use the argumentignore_mismatched_sizes=True, the same error will raise while inference on the model.
How should I solve it?

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2025-01-20T11:58:50Z

You can try using DeepSpeed zero3

bw-wang19 added bug Something isn't working pending This problem is yet to be addressed labels Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

bw-wang19 commented Jan 18, 2025

hiyouga commented Jan 20, 2025

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

Param Size Mismatch after Fine-tuned on Llama-3-8B-Instruct #6700

Comments

bw-wang19 commented Jan 18, 2025

Reminder

System Info

Reproduction

Others

hiyouga commented Jan 20, 2025