Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

PPO manual reward functions
#2363 opened Nov 18, 2024 by schmidtj3
Still not supporting for ChatGLM3 maybe
#2362 opened Nov 18, 2024 by fjy01
7 of 9 tasks
Contributing new distillation related trainers
#2361 opened Nov 16, 2024 by YihanCao123
1 of 3 tasks
How to train from scratch? Can you provide the code ❓ question Seeking clarification or more information
#2356 opened Nov 14, 2024 by sankexin
5 of 9 tasks
Dpo Train Issue: max step from 1000 to 996349
#2355 opened Nov 14, 2024 by seTalent
8 of 9 tasks
BUG in the new PPO trainer
#2353 opened Nov 13, 2024 by TingchenFu
7 of 9 tasks
RLOO Checkpoint Issue
#2342 opened Nov 11, 2024 by asparius
2 of 4 tasks
Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader 🐛 bug Something isn't working 🏋 PPO Related to PPO
#2340 opened Nov 10, 2024 by Debolena7
Difference between SFTTrainer and Seq2seqTrainer ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2339 opened Nov 9, 2024 by Hyfred
RuntimeError: chunk expects at least a 1-dimensional tensor 🐛 bug Something isn't working 🏋 SFT Related to SFT
#2338 opened Nov 8, 2024 by imrankh46
4 tasks done
DPO Training DataLoader is not shuffled 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2337 opened Nov 7, 2024 by kaiwenw
4 tasks
Accelerator package version problem 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 PPO Related to PPO
#2335 opened Nov 7, 2024 by littleshutong
2 of 4 tasks
RLooTrainer bug when using deepspeed 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 RLOO Related to RLOO
#2329 opened Nov 6, 2024 by macheng6
2 of 4 tasks
Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) 🏋 DPO Related to DPO ❓ question Seeking clarification or more information 👁️ VLM Related to Visual Language Models
#2326 opened Nov 5, 2024 by DarioPTWR
Using a different ref_model from model leads to incorrect results ✨ enhancement New feature or request ❓ question Seeking clarification or more information
#2307 opened Nov 1, 2024 by DarshanDeshpande
2 of 4 tasks
Code migration suggestions 🏋 DPO Related to DPO ⏳ needs more info Additional information or clarification is required to proceed ❓ question Seeking clarification or more information
#2296 opened Oct 30, 2024 by MonolithFoundation
OOM when finetuning Llama3.2-90B on 8xA100 80GB
#2294 opened Oct 29, 2024 by maximilianmordig
2 of 4 tasks
wrong objective/entropy in RLOOTrainer 🐛 bug Something isn't working 🏋 RLOO Related to RLOO
#2281 opened Oct 25, 2024 by serendipity800
1 of 4 tasks
ProTip! Adding no:label will show everything without a label.