Weird Loss Curve #831

Zihang-Xu-2002 · 2024-05-17T17:49:59Z

I trained the llama3 on my own conversation dataset with the command :
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Meta-Llama-3-8B
--dataset_path data/alpaca_selected/train
--conversation_template llama3
--output_model_path output_models/finetuned_llama3_8b_selected

The initial learning rate is 2e-5 and batchsize_per_device is 4
And I found there are sharp drops at the beginning of every epoch. But during the epoch, there's no obvious loss drop.

Before this I trained llama2
./scripts/run_finetune.sh
--model_name_or_path meta-llama/Llama-2-7b-hf
--dataset_path data/alpaca_raw/train
--conversation_template llama2
--output_model_path output_models/finetuned_llama2_7b_raw

The initial learning rate is 8e-6 and batchsize_per_device is 4. The loss looks like :

I am not sure if the gradient accumulation leads to this. I modified the "gradient_accumulation_steps" in configs/ds_config_zero3.json to 1 . But there's no changes.

Could you help me with this issue? Thank you for your time and attention.

research4pan · 2024-05-19T04:02:26Z

Thanks for your interest in LMFlow! We've observed similar loss curves in some of our experiments. After careful examination, we attributed this to the overfitting of instruction following dataset on llama models. Inside each epoch, the flattened loss curve may come from the large variance of the dataset, decreasing the learning rate or increasing the batch size should help, though the overall tendency should remain the same.

You may check your evaluation/test results, if the results are normal then it may not be a serious issue 😄

Zihang-Xu-2002 changed the title ~~Wired Loss Curve~~ Weird Loss Curve May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird Loss Curve #831

Weird Loss Curve #831

Zihang-Xu-2002 commented May 17, 2024

research4pan commented May 19, 2024 •

edited

Loading

Weird Loss Curve #831

Weird Loss Curve #831

Comments

Zihang-Xu-2002 commented May 17, 2024

research4pan commented May 19, 2024 • edited Loading

research4pan commented May 19, 2024 •

edited

Loading