You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
compute_environment: LOCAL_MACHINEdebug: falsedistributed_type: FSDPdowncast_bf16: 'no'fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAPfsdp_backward_prefetch: BACKWARD_PREfsdp_forward_prefetch: falsefsdp_cpu_ram_efficient_loading: truefsdp_offload_params: true # offload may affect training speedfsdp_sharding_strategy: FULL_SHARDfsdp_state_dict_type: FULL_STATE_DICTfsdp_sync_module_states: truefsdp_use_orig_params: truemachine_rank: 0main_training_function: mainmixed_precision: fp16 # or bf16num_machines: 1# the number of nodesnum_processes: 8# the number of GPUs in all nodesrdzv_backend: staticsame_network: truetpu_env: []tpu_use_cluster: falsetpu_use_sudo: falseuse_cpu: false
Loading checkpoint shards: 100%|██████████| 6/6 [00:00<00:00, 10.36it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[5], [line 3](vscode-notebook-cell:?execution_count=5&line=3)
[1](vscode-notebook-cell:?execution_count=5&line=1) model_path = '/workspace/acl/sftmodels/llama3-8b-instruct-medicine_20k-full-fsdp'
[2](vscode-notebook-cell:?execution_count=5&line=2) tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=False, device_map = "auto")
----> [3](vscode-notebook-cell:?execution_count=5&line=3) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
[562](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:562) elif type(config) in cls._model_mapping.keys():
[563](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:563) model_class = _get_model_class(config, cls._model_mapping)
--> [564](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:564) return model_class.from_pretrained(
[565](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:565) pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
[566](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:566) )
[567](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:567) raise ValueError(
[568](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:568) f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
[569](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:569) f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
[570](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py:570) )
File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
[3998](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3998) if dtype_orig is not None:
[3999](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:3999) torch.set_default_dtype(dtype_orig)
[4001](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4001) (
[4002](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4002) model,
[4003](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4003) missing_keys,
[4004](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4004) unexpected_keys,
[4005](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4005) mismatched_keys,
[4006](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4006) offload_index,
[4007](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4007) error_msgs,
-> [4008](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4008) ) = cls._load_pretrained_model(
[4009](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4009) model,
[4010](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4010) state_dict,
[4011](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4011) loaded_state_dict_keys, # XXX: rename?
[4012](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4012) resolved_archive_file,
[4013](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4013) pretrained_model_name_or_path,
[4014](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4014) ignore_mismatched_sizes=ignore_mismatched_sizes,
[4015](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4015) sharded_metadata=sharded_metadata,
[4016](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4016) _fast_init=_fast_init,
[4017](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4017) low_cpu_mem_usage=low_cpu_mem_usage,
[4018](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4018) device_map=device_map,
[4019](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4019) offload_folder=offload_folder,
[4020](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4020) offload_state_dict=offload_state_dict,
[4021](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4021) dtype=torch_dtype,
[4022](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4022) hf_quantizer=hf_quantizer,
[4023](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4023) keep_in_fp32_modules=keep_in_fp32_modules,
[4024](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4024) gguf_path=gguf_path,
[4025](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4025) )
[4027](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4027) # make sure token embedding weights are still tied if needed
[4028](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4028) model.tie_weights()
File /opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553, in PreTrainedModel._load_pretrained_model(***failed resolving arguments***)
[4549](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4549) if "size mismatch" in error_msg:
[4550](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4550) error_msg += (
[4551](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4551) "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
[4552](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4552) )
-> [4553](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4553) raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
[4555](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4555) if len(unexpected_keys) > 0:
[4556](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f776277227d-0040ssh-002dremote-002b192-002e168-002e205-002e47.vscode-resource.vscode-cdn.net/opt/conda/envs/llama-factory/lib/python3.11/site-packages/transformers/modeling_utils.py:4556) archs = [] if model.config.architectures is None else model.config.architectures
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([131334656]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for model.norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
Even if I use the argumentignore_mismatched_sizes=True, the same error will raise while inference on the model.
How should I solve it?
Others
No response
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
My bash entry:
in which,
fsdp_config.yaml is:
llama3_full_sft.yaml is:
After fine-tuned without raising any error, I try to load the fine-tuned model and inference with transformer lib:
Here the error raises:
Even if I use the argument
ignore_mismatched_sizes=True
, the same error will raise while inference on the model.How should I solve it?
Others
No response
The text was updated successfully, but these errors were encountered: