PLZ make padding_free for DataCollatorForChatML
.
#2736
Labels
✨ enhancement
New feature or request
🏋 GKD
Related to GKD
🙋 help from community wanted
Open invitation for community members to contribute
Feature request
I want to implement GKD in a padding-free manner. (I really want to save memory to the extreme.)
The challenge is that GKD involves inference from either the student model or the teacher model. If we try to maximize sequence packing within a batch, aligning the shapes for loss computation is expected to become quite complex.
Could someone who is really skilled in algorithms create a solution that takes this into account?
I’m begging you—please! 🙏
Motivation
GKD consumes a significant amount of GPU memory, but implementing this approach could lead to improvements in both training speed and memory efficiency.
Your contribution
I tried to design the code, but since I'm not very smart, I couldn't come up with a good structure... I thought the only option was to ask someone for help.😥
The text was updated successfully, but these errors were encountered: