generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issues: huggingface/trl
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Contributing new distillation related trainers
#2361
opened Nov 16, 2024 by
YihanCao123
1 of 3 tasks
Question about the logprobs of the policy-generated sentences in PPO trainer
#2358
opened Nov 15, 2024 by
yanghh2000
6 of 9 tasks
PPOTrainer with HuggingFace PreTrainedModelWrapper Models
#2357
opened Nov 14, 2024 by
Mrinh212375
7 of 9 tasks
How to train from scratch? Can you provide the code
❓ question
Seeking clarification or more information
#2356
opened Nov 14, 2024 by
sankexin
5 of 9 tasks
KTO:
unpair_preference_dataset
does not work for datasets with additional columns
#2351
opened Nov 13, 2024 by
LuisVasquezBSC
[Question] Why is Importance Sampling and Clipping applied in RLOO?
#2341
opened Nov 10, 2024 by
shashankg7
Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader
🐛 bug
Something isn't working
🏋 PPO
Related to PPO
#2340
opened Nov 10, 2024 by
Debolena7
Difference between SFTTrainer and Seq2seqTrainer
❓ question
Seeking clarification or more information
🏋 SFT
Related to SFT
#2339
opened Nov 9, 2024 by
Hyfred
RuntimeError: chunk expects at least a 1-dimensional tensor
🐛 bug
Something isn't working
🏋 SFT
Related to SFT
#2338
opened Nov 8, 2024 by
imrankh46
4 tasks done
DPO Training DataLoader is not shuffled
🏋 DPO
Related to DPO
✨ enhancement
New feature or request
#2337
opened Nov 7, 2024 by
kaiwenw
4 tasks
Accelerator package version problem
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 PPO
Related to PPO
#2335
opened Nov 7, 2024 by
littleshutong
2 of 4 tasks
RLooTrainer bug when using deepspeed
🐛 bug
Something isn't working
🚀 deepspeed
Related to deepspeed
🏋 RLOO
Related to RLOO
#2329
opened Nov 6, 2024 by
macheng6
2 of 4 tasks
Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO)
🏋 DPO
Related to DPO
❓ question
Seeking clarification or more information
👁️ VLM
Related to Visual Language Models
#2326
opened Nov 5, 2024 by
DarioPTWR
RLOOTrainer ignores custom DataCollatorWithPadding in favor of default one
🐛 bug
Something isn't working
🏋 RLOO
Related to RLOO
#2309
opened Nov 2, 2024 by
anch0vy
Using a different New feature or request
❓ question
Seeking clarification or more information
ref_model
from model
leads to incorrect results
✨ enhancement
#2307
opened Nov 1, 2024 by
DarshanDeshpande
2 of 4 tasks
Code migration suggestions
🏋 DPO
Related to DPO
⏳ needs more info
Additional information or clarification is required to proceed
❓ question
Seeking clarification or more information
#2296
opened Oct 30, 2024 by
MonolithFoundation
OOM when finetuning Llama3.2-90B on 8xA100 80GB
#2294
opened Oct 29, 2024 by
maximilianmordig
2 of 4 tasks
wrong objective/entropy in RLOOTrainer
🐛 bug
Something isn't working
🏋 RLOO
Related to RLOO
#2281
opened Oct 25, 2024 by
serendipity800
1 of 4 tasks
Feature Request: String-Based Comparison Reward model for RLOOTrainer
✨ enhancement
New feature or request
🏋 RLOO
Related to RLOO
#2280
opened Oct 25, 2024 by
HiroshigeAoki
Previous Next
ProTip!
Adding no:label will show everything without a label.