500条数据训练完全无效果 #6339

Evi233 · 2024-12-15T13:24:56Z

Evi233
Dec 15, 2024

不知为何训练数据足够但是效果差几乎没有起到任何影响，求大佬指教！

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
Python version: 3.12.3
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA H20
DeepSpeed version: 0.15.4

以下是我的一些参数和数据集内容



以下是我的训练loss图

训练脚本：

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template empty \
    --flash_attn auto \
    --use_unsloth True \
    --dataset_dir /root/autodl-tmp/data \
    --dataset thinking \
    --cutoff_len 2048 \
    --learning_rate 0.0002 \
    --num_train_epochs 10.0 \
    --max_samples 1000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --packing False \
    --report_to none \
    --output_dir saves/Custom/lora/train_2024-12-01-15-23-01 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --optim adamw_torch \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --deepspeed cache/ds_z3_config.json

日志：```
训练完毕。

[INFO|2024-12-15 22:34:50] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16

[INFO|2024-12-15 22:34:50] configuration_utils.py:677 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/config.json

[INFO|2024-12-15 22:34:50] configuration_utils.py:746 >> Model config Qwen2VLConfig { "_name_or_path": "/root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f", "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file vocab.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file merges.txt

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file tokenizer.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file added_tokens.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file special_tokens_map.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file tokenizer_config.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2024-12-15 22:34:50] image_processing_base.py:373 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/preprocessor_config.json

[INFO|2024-12-15 22:34:50] image_processing_base.py:429 >> Image processor Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 }

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file vocab.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file merges.txt

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file tokenizer.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file added_tokens.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file special_tokens_map.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2209 >> loading file tokenizer_config.json

[INFO|2024-12-15 22:34:50] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2024-12-15 22:34:51] processing_utils.py:755 >> Processor Qwen2VLProcessor:

image_processor: Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 }
tokenizer: Qwen2TokenizerFast(name_or_path='/root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

{ "processor_class": "Qwen2VLProcessor" }

[INFO|2024-12-15 22:34:51] logging.py:157 >> Replace eos token: <|im_end|>

[INFO|2024-12-15 22:34:51] logging.py:157 >> Loading dataset data_no_history.json...

[INFO|2024-12-15 22:34:55] configuration_utils.py:677 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/config.json

[INFO|2024-12-15 22:34:55] configuration_utils.py:746 >> Model config Qwen2VLConfig { "_name_or_path": "/root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f", "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }

[INFO|2024-12-15 22:34:55] modeling_utils.py:3934 >> loading weights file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/model.safetensors.index.json

[INFO|2024-12-15 22:34:55] modeling_utils.py:1670 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16.

[INFO|2024-12-15 22:34:55] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 }

[INFO|2024-12-15 22:34:55] modeling_utils.py:1670 >> Instantiating Qwen2VisionTransformerPretrainedModel model under default dtype torch.bfloat16.

[WARNING|2024-12-15 22:34:55] logging.py:168 >> Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46

[INFO|2024-12-15 22:35:00] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.

[INFO|2024-12-15 22:35:00] modeling_utils.py:4808 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.

[INFO|2024-12-15 22:35:00] configuration_utils.py:1049 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/generation_config.json

[INFO|2024-12-15 22:35:00] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "temperature": 0.01, "top_k": 1, "top_p": 0.001 }

[INFO|2024-12-15 22:35:00] logging.py:157 >> Gradient checkpointing enabled.

[INFO|2024-12-15 22:35:00] logging.py:157 >> Using torch SDPA for faster training and inference.

[INFO|2024-12-15 22:35:00] logging.py:157 >> Upcasting trainable params to float32.

[INFO|2024-12-15 22:35:00] logging.py:157 >> Fine-tuning method: LoRA

[INFO|2024-12-15 22:35:00] logging.py:157 >> Found linear modules: gate_proj,up_proj,q_proj,v_proj,o_proj,down_proj,k_proj

[INFO|2024-12-15 22:35:01] logging.py:157 >> trainable params: 20,185,088 || all params: 8,311,560,704 || trainable%: 0.2429

[INFO|2024-12-15 22:35:01] trainer.py:698 >> Using auto half precision backend

[INFO|2024-12-15 22:35:01] trainer.py:2313 >> ***** Running training *****

[INFO|2024-12-15 22:35:01] trainer.py:2314 >> Num examples = 450

[INFO|2024-12-15 22:35:01] trainer.py:2315 >> Num Epochs = 3

[INFO|2024-12-15 22:35:01] trainer.py:2316 >> Instantaneous batch size per device = 2

[INFO|2024-12-15 22:35:01] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 16

[INFO|2024-12-15 22:35:01] trainer.py:2320 >> Gradient Accumulation steps = 8

[INFO|2024-12-15 22:35:01] trainer.py:2321 >> Total optimization steps = 84

[INFO|2024-12-15 22:35:01] trainer.py:2322 >> Number of trainable parameters = 20,185,088

[INFO|2024-12-15 22:35:58] logging.py:157 >> {'loss': 0.6811, 'learning_rate': 4.9564e-05, 'epoch': 0.18}

[INFO|2024-12-15 22:36:49] logging.py:157 >> {'loss': 0.6575, 'learning_rate': 4.8272e-05, 'epoch': 0.36}

[INFO|2024-12-15 22:37:45] logging.py:157 >> {'loss': 0.6249, 'learning_rate': 4.6168e-05, 'epoch': 0.53}

[INFO|2024-12-15 22:38:37] logging.py:157 >> {'loss': 0.5894, 'learning_rate': 4.3326e-05, 'epoch': 0.71}

[INFO|2024-12-15 22:39:30] logging.py:157 >> {'loss': 0.6103, 'learning_rate': 3.9846e-05, 'epoch': 0.89}

[INFO|2024-12-15 22:40:16] logging.py:157 >> {'loss': 0.5776, 'learning_rate': 3.5847e-05, 'epoch': 1.07}

[INFO|2024-12-15 22:41:05] logging.py:157 >> {'loss': 0.5526, 'learning_rate': 3.1470e-05, 'epoch': 1.24}

[INFO|2024-12-15 22:41:55] logging.py:157 >> {'loss': 0.5321, 'learning_rate': 2.6868e-05, 'epoch': 1.42}

[INFO|2024-12-15 22:42:51] logging.py:157 >> {'loss': 0.5318, 'learning_rate': 2.2201e-05, 'epoch': 1.60}

[INFO|2024-12-15 22:43:43] logging.py:157 >> {'loss': 0.5415, 'learning_rate': 1.7631e-05, 'epoch': 1.78}

[INFO|2024-12-15 22:44:39] logging.py:157 >> {'loss': 0.5456, 'learning_rate': 1.3318e-05, 'epoch': 1.96}

[INFO|2024-12-15 22:45:35] logging.py:157 >> {'loss': 0.5419, 'learning_rate': 9.4128e-06, 'epoch': 2.13}

[INFO|2024-12-15 22:46:25] logging.py:157 >> {'loss': 0.5615, 'learning_rate': 6.0507e-06, 'epoch': 2.31}

[INFO|2024-12-15 22:47:09] logging.py:157 >> {'loss': 0.5684, 'learning_rate': 3.3494e-06, 'epoch': 2.49}

[INFO|2024-12-15 22:48:09] logging.py:157 >> {'loss': 0.4966, 'learning_rate': 1.4029e-06, 'epoch': 2.67}

[INFO|2024-12-15 22:48:58] logging.py:157 >> {'loss': 0.5051, 'learning_rate': 2.7923e-07, 'epoch': 2.84}

[INFO|2024-12-15 22:49:40] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84

[INFO|2024-12-15 22:49:40] configuration_utils.py:677 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/config.json

[INFO|2024-12-15 22:49:40] configuration_utils.py:746 >> Model config Qwen2VLConfig { "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }

[INFO|2024-12-15 22:49:41] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/tokenizer_config.json

[INFO|2024-12-15 22:49:41] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/special_tokens_map.json

[INFO|2024-12-15 22:49:41] image_processing_base.py:258 >> Image processor saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/preprocessor_config.json

[INFO|2024-12-15 22:49:41] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/tokenizer_config.json

[INFO|2024-12-15 22:49:41] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/special_tokens_map.json

[INFO|2024-12-15 22:49:42] processing_utils.py:541 >> chat template saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/checkpoint-84/chat_template.json

[INFO|2024-12-15 22:49:42] trainer.py:2584 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

[INFO|2024-12-15 22:49:42] image_processing_base.py:258 >> Image processor saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/preprocessor_config.json

[INFO|2024-12-15 22:49:42] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/tokenizer_config.json

[INFO|2024-12-15 22:49:42] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/special_tokens_map.json

[INFO|2024-12-15 22:49:42] processing_utils.py:541 >> chat template saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/chat_template.json

[INFO|2024-12-15 22:49:42] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22

[INFO|2024-12-15 22:49:42] configuration_utils.py:677 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/config.json

[INFO|2024-12-15 22:49:42] configuration_utils.py:746 >> Model config Qwen2VLConfig { "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 }

[INFO|2024-12-15 22:49:42] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/tokenizer_config.json

[INFO|2024-12-15 22:49:42] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22/special_tokens_map.json

[WARNING|2024-12-15 22:49:42] logging.py:162 >> No metric eval_loss to plot.

[WARNING|2024-12-15 22:49:42] logging.py:162 >> No metric eval_accuracy to plot.

[INFO|2024-12-15 22:49:42] trainer.py:4117 >> ***** Running Evaluation *****

[INFO|2024-12-15 22:49:42] trainer.py:4119 >> Num examples = 50

[INFO|2024-12-15 22:49:42] trainer.py:4122 >> Batch size = 2

[INFO|2024-12-15 22:49:51] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}



加载模型时候的日志：

[INFO|modeling_utils.py:4808] 2024-12-15 22:51:50,813 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1049] 2024-12-15 22:51:50,817 >> loading configuration file /root/autodl-tmp/snapshots/51c47430f97dd7c74aa1fa6825e68a813478097f/generation_config.json
[INFO|configuration_utils.py:1096] 2024-12-15 22:51:50,817 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.01,
"top_k": 1,
"top_p": 0.001
}

[INFO|2024-12-15 22:51:50] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2024-12-15 22:51:51] llamafactory.model.adapter:157 >> Merged 1 adapter(s).
[INFO|2024-12-15 22:51:51] llamafactory.model.adapter:157 >> Loaded adapter(s): saves/Qwen2-VL-7B-Instruct/lora/train_2024-12-15-22-27-22
[INFO|2024-12-15 22:51:51] llamafactory.model.loader:157 >> all params: 8,291,375,616

我确定勾选了训练好的检查点路径，使用huggingface方式加载了模型和检查点，但是效果和没训练一样。我又试了 identity.json，并勾选了训练好的检查点路径，使用huggingface方式加载了模型和检查点。但结果还是一样的，模型依旧回答自己是千问大模型。

自带的identity训练效果（仅供测试就没做改动）：

请问大佬，这种是什么问题呢？是数据集问题还是参数问题，抑或是我的打开方式不正确？小白求教🙏

EntropyYue · 2025-01-17T19:20:48Z

EntropyYue
Jan 17, 2025

看起来是数据集太小导致欠拟合

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

500条数据训练完全无效果 #6339

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

500条数据训练完全无效果 #6339

Evi233 Dec 15, 2024

System Info

Replies: 1 comment

EntropyYue Jan 17, 2025

Evi233
Dec 15, 2024

EntropyYue
Jan 17, 2025