You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if Megatron-LM supports keeping only the latest N checkpoints. When using a small save_interval for frequent checkpointing, the storage occupied by checkpoints grows rapidly. It would be very helpful if there was a parameter similar to save_total_limit in the Transformers library, which would allow us to constrain the maximum number of checkpoints saved. This would effectively reduce storage consumption.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I'm wondering if Megatron-LM supports keeping only the latest N checkpoints. When using a small
save_interval
for frequent checkpointing, the storage occupied by checkpoints grows rapidly. It would be very helpful if there was a parameter similar tosave_total_limit
in the Transformers library, which would allow us to constrain the maximum number of checkpoints saved. This would effectively reduce storage consumption.Thanks!
The text was updated successfully, but these errors were encountered: