Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

NashMD trainer sampling policy wrong ⚡accelerate Related to accelerate 🐛 bug Something isn't working ⚡ PEFT Related to PEFT
#2781 opened Feb 6, 2025 by zhourunlong
5 tasks done
lora don't work! OOM 🐛 bug Something isn't working ⚡ PEFT Related to PEFT
#2780 opened Feb 6, 2025 by zhangguoxin1
5 tasks done
ORPOTrainer crashes due to pickling failure if dataloader_num_workers > 0 🐛 bug Something isn't working 🏋 ORPO Related to ORPO
#2779 opened Feb 6, 2025 by kiratp
Allow vllm sub-batching to avoid CUDA out of memory ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2775 opened Feb 5, 2025 by cfpark00
GRPO tests failing in multi-device setting on main 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2774 opened Feb 5, 2025 by tyler-romero
5 tasks done
How to log more metrics with wandb when using GRPO trainer and accelerate ⚡accelerate Related to accelerate ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2768 opened Feb 5, 2025 by andrewsiah
5 tasks done
Add Custom Reward Functions To Online DPO (and other methods) ✨ enhancement New feature or request 🏋 GRPO Related to GRPO 🏋 Online DPO Related to Online DPO 🏋 Reward Related to Reward modelling 🏋 RLOO Related to RLOO
#2767 opened Feb 4, 2025 by xzuyn
Wrong quick start guide and value_model error 🐛 bug Something isn't working 📚 documentation Improvements or additions to documentation 🏋 PPO Related to PPO
#2764 opened Feb 4, 2025 by elliot-zzh
Llama 3 family of models does not seem to work with RewardTrainer ⚡accelerate Related to accelerate ⚡ PEFT Related to PEFT 🏋 Reward Related to Reward modelling
#2758 opened Feb 4, 2025 by JohnGiorgi
5 tasks done
Tracking Liger-Kernel progress for GRPO Loss 🏋 GRPO Related to GRPO
#2756 opened Feb 3, 2025 by Superskyyy
🐛 Installation Issue: Unable to Install : From provided instruction of contribution.md 🐛 bug Something isn't working 📚 documentation Improvements or additions to documentation
#2753 opened Feb 3, 2025 by rawathemant246
Possible discrepancy in GRPO loss: Paper vs. implementation (log-prob vs. prob) 🐛 bug Something isn't working 🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information
#2752 opened Feb 3, 2025 by liranringel
feat(GRPOTrainer): reward_func return None to skip ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2737 opened Feb 2, 2025 by ctjlewis
PLZ make padding_free for DataCollatorForChatML. ✨ enhancement New feature or request 🏋 GKD Related to GKD 🙋 help from community wanted Open invitation for community members to contribute
#2736 opened Feb 2, 2025 by YooSungHyun
SFTvsRL SFT Memorizes, RL Generalizes ✨ enhancement New feature or request
#2735 opened Feb 2, 2025 by NickyDark1
GRPO Trainer supports VLMs ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2734 opened Feb 2, 2025 by sunildkumar
GKD Example why do not use labels? 🏋 GKD Related to GKD ❓ question Seeking clarification or more information
#2732 opened Feb 2, 2025 by YooSungHyun
5 tasks done
Latest TRL code = significantly worse rewards for GRPO training 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2731 opened Feb 2, 2025 by abacaj
5 tasks done
Training Agents with GRPO 🏋 GRPO Related to GRPO
#2723 opened Jan 31, 2025 by August-murr
OOM for 7B model on A100 80Gb 🐛 bug Something isn't working
#2719 opened Jan 31, 2025 by JohnConnor123
5 tasks done
ProTip! Add no:assignee to see everything that’s not assigned.