You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained the llama3 on my own conversation dataset with the command :
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Meta-Llama-3-8B
--dataset_path data/alpaca_selected/train
--conversation_template llama3
--output_model_path output_models/finetuned_llama3_8b_selected
The initial learning rate is 2e-5 and batchsize_per_device is 4
And I found there are sharp drops at the beginning of every epoch. But during the epoch, there's no obvious loss drop.
Before this I trained llama2
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Llama-2-7b-hf
--dataset_path data/alpaca_raw/train
--conversation_template llama2
--output_model_path output_models/finetuned_llama2_7b_raw
The initial learning rate is 8e-6 and batchsize_per_device is 4. The loss looks like :
I am not sure if the gradient accumulation leads to this. I modified the "gradient_accumulation_steps" in configs/ds_config_zero3.json to 1 . But there's no changes.
Could you help me with this issue? Thank you for your time and attention.
The text was updated successfully, but these errors were encountered:
Thanks for your interest in LMFlow! We've observed similar loss curves in some of our experiments. After careful examination, we attributed this to the overfitting of instruction following dataset on llama models. Inside each epoch, the flattened loss curve may come from the large variance of the dataset, decreasing the learning rate or increasing the batch size should help, though the overall tendency should remain the same.
You may check your evaluation/test results, if the results are normal then it may not be a serious issue 😄
I trained the llama3 on my own conversation dataset with the command :
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Meta-Llama-3-8B
--dataset_path data/alpaca_selected/train
--conversation_template llama3
--output_model_path output_models/finetuned_llama3_8b_selected
The initial learning rate is 2e-5 and batchsize_per_device is 4
And I found there are sharp drops at the beginning of every epoch. But during the epoch, there's no obvious loss drop.
Before this I trained llama2
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Llama-2-7b-hf
--dataset_path data/alpaca_raw/train
--conversation_template llama2
--output_model_path output_models/finetuned_llama2_7b_raw
The initial learning rate is 8e-6 and batchsize_per_device is 4. The loss looks like :
I am not sure if the gradient accumulation leads to this. I modified the "gradient_accumulation_steps" in configs/ds_config_zero3.json to 1 . But there's no changes.
Could you help me with this issue? Thank you for your time and attention.
The text was updated successfully, but these errors were encountered: