About the learning rate #19

lucasjinreal · 2023-07-14T02:58:14Z

from the script provided, I think longchat is full sft rather than lora, but the equal batch size total is just 1 (batch_size * gradient_accum * num_gpus)

But vicuna original fschat training full params sft, using equal batch size of 128, why lr is different? Which one should be adopted if only have 2 80G ?

DachengLi1 · 2023-07-22T06:02:25Z

@lucasjinreal I think either is fine - you can go with the largest batch size your gpu support, either with or without gradient accumulation,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the learning rate #19

About the learning rate #19

lucasjinreal commented Jul 14, 2023

DachengLi1 commented Jul 22, 2023

About the learning rate #19

About the learning rate #19

Comments

lucasjinreal commented Jul 14, 2023

DachengLi1 commented Jul 22, 2023