PLZ make padding_free for `DataCollatorForChatML`. #2736

YooSungHyun · 2025-02-02T05:44:23Z

Feature request

I want to implement GKD in a padding-free manner. (I really want to save memory to the extreme.)

The challenge is that GKD involves inference from either the student model or the teacher model. If we try to maximize sequence packing within a batch, aligning the shapes for loss computation is expected to become quite complex.

Could someone who is really skilled in algorithms create a solution that takes this into account?
I’m begging you—please! 🙏

Motivation

GKD consumes a significant amount of GPU memory, but implementing this approach could lead to improvements in both training speed and memory efficiency.

Your contribution

I tried to design the code, but since I'm not very smart, I couldn't come up with a good structure... I thought the only option was to ask someone for help.😥

YooSungHyun · 2025-02-02T05:46:56Z

@qgallouedec

I saw that you implemented DPO in a padding-free manner—would you be able to contribute to GKD as well? I know it's a lot to ask, but I’d really appreciate it. 😅

github-actions bot added 🏋 GKD Related to GKD ✨ enhancement New feature or request 🙋 help from community wanted Open invitation for community members to contribute labels Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLZ make padding_free for `DataCollatorForChatML`. #2736

PLZ make padding_free for `DataCollatorForChatML`. #2736

YooSungHyun commented Feb 2, 2025

YooSungHyun commented Feb 2, 2025

PLZ make padding_free for DataCollatorForChatML. #2736

PLZ make padding_free for DataCollatorForChatML. #2736

Comments

YooSungHyun commented Feb 2, 2025

Feature request

Motivation

Your contribution

YooSungHyun commented Feb 2, 2025

PLZ make padding_free for `DataCollatorForChatML`. #2736

PLZ make padding_free for `DataCollatorForChatML`. #2736