We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于batch-size在论文中提及的很少。我有看过NVIDIA的Megatron模型代码,在一张V-100上存1.3B(模型并行为2的条件下)的参数,batch_size最大为16(默认是8),如果不用梯度累加策略的话,在 64卡上batch最大为512,请问您是如何做到3072呢?
The text was updated successfully, but these errors were encountered:
做了梯度累计,batch_size 12然后做8次梯度累计。
Sorry, something went wrong.
所以,请问是2张卡做模型并行、32张卡做数据并行吗?
No branches or pull requests
关于batch-size在论文中提及的很少。我有看过NVIDIA的Megatron模型代码,在一张V-100上存1.3B(模型并行为2的条件下)的参数,batch_size最大为16(默认是8),如果不用梯度累加策略的话,在 64卡上batch最大为512,请问您是如何做到3072呢?
The text was updated successfully, but these errors were encountered: