Checkpoint on final step of training even when it doesn't coincide with `save_freq`. #284

alexander-soare · 2024-06-20T07:06:38Z

What this does

^

How it was tested / how to checkout and try

I ran training with 10 offline steps (and save_freq > 10) and verified that the checkpoint was saved.

JOB_NAME=test

python lerobot/scripts/train.py \
    hydra.job.name=$JOB_NAME \
    hydra.run.dir=outputs/train/$(date +'%Y-%m-%d/%H-%M-%S')_${JOB_NAME} \
    env=pusht \
    policy=diffusion \
    training.save_checkpoint=true \
    training.offline_steps=10 \
    training.save_freq=20000 \
    training.eval_freq=10000 \
    training.log_freq=50 \
    eval.n_episodes=50 \
    eval.batch_size=50 \
    wandb.enable=false \
    wandb.disable_artifact=true \
    device=cuda \
    use_amp=true

marinabar

Looks good to me !

ready for review

3ed9971

marinabar approved these changes Jun 20, 2024

View reviewed changes

alexander-soare merged commit 9aa4cdb into huggingface:main Jun 20, 2024
5 checks passed

alexander-soare deleted the checkpoint_on_final_step branch June 20, 2024 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint on final step of training even when it doesn't coincide with `save_freq`. #284

Checkpoint on final step of training even when it doesn't coincide with `save_freq`. #284

alexander-soare commented Jun 20, 2024

marinabar left a comment

Checkpoint on final step of training even when it doesn't coincide with save_freq. #284

Checkpoint on final step of training even when it doesn't coincide with save_freq. #284

Conversation

alexander-soare commented Jun 20, 2024

What this does

How it was tested / how to checkout and try

marinabar left a comment

Choose a reason for hiding this comment

Checkpoint on final step of training even when it doesn't coincide with `save_freq`. #284

Checkpoint on final step of training even when it doesn't coincide with `save_freq`. #284