fix ppov2_trainer tensorboard logging bug #1836

DZ9 · 2024-07-16T10:06:38Z

Tensorboard log fails because current code doesn't update global step.

before:
after this PR:
test command

python examples/scripts/ppo/ppo.py \
    --learning_rate 3e-6 \
    --output_dir ./output \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --total_episodes 100 \
    --model_name_or_path Qwen2/Qwen2-0.5B-Instruct \
    --non_eos_penalty \
    --sft_model_path Qwen2/Qwen2-0.5B-Instruct \
    --reward_model_path Qwen2/Qwen2-0.5B-Instruct \
    --report_to tensorboard \
    --local_rollout_forward_batch_size 1 \
    --logging_strategy steps \
    --logging_steps 1

HuggingFaceDocBuilderDev · 2024-07-16T11:28:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fix ppov2_trainer tensorboard log bugs

82f2d64

kashif approved these changes Jul 16, 2024

View reviewed changes

kashif merged commit 052a8e1 into huggingface:main Jul 16, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ppov2_trainer tensorboard logging bug #1836

fix ppov2_trainer tensorboard logging bug #1836

DZ9 commented Jul 16, 2024

HuggingFaceDocBuilderDev commented Jul 16, 2024

fix ppov2_trainer tensorboard logging bug #1836

fix ppov2_trainer tensorboard logging bug #1836

Conversation

DZ9 commented Jul 16, 2024

HuggingFaceDocBuilderDev commented Jul 16, 2024