Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning on the last turn of multi-turn conversations #2545

Open
okhat opened this issue Jan 6, 2025 · 2 comments
Open

Finetuning on the last turn of multi-turn conversations #2545

okhat opened this issue Jan 6, 2025 · 2 comments
Assignees
Labels
❓ question Seeking clarification or more information 🏋 SFT Related to SFT

Comments

@okhat
Copy link

okhat commented Jan 6, 2025

Thanks for your great work! I'm updating some old code and would like to use TRL to finetune on the final turn of multi-turn conversations. The simple approaches I tried don't accomplish that.

Let's say I'd like to finetune meta-llama/Llama-3.2-1B-Instruct, though ideally without hardcoding templates for it. I'll keep my report here short since I expect there's a straightforward way I just can't find in the docs.

from dataset import Dataset
from trl import apply_chat_template

train_data = [
   {
      'messages': [
         {'content': 'System!', 'role': 'system'},
         {'content': 'User!', 'role': 'user'},
         {'content': 'Response!', 'role': 'assistant'}
         ]
   }
]

trainset_dict = {
    "prompt": [entry["messages"][:-1] for entry in train_data],
    "completion": [[entry["messages"][-1]] for entry in train_data]
}
trainset = Dataset.from_dict(trainset_dict)
trainset = trainset.map(apply_chat_template, fn_kwargs={"tokenizer": tokenizer})

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(...),
    train_dataset=trainset,
    processing_class=tokenizer,
)

trainer.train()

I've also tried without this trainset.map. When doing that, I attempted to set up a data collator, both DataCollatorForCompletionOnlyLM and DataCollatorForChatML, but faced different issues. The former has trouble locating the instruction_template and response_template and the latter expects examples before tokenization but receives them tokenized, so it crashes on inability to find the key messages in examples.

Appreciate your help!

@August-murr August-murr added ❓ question Seeking clarification or more information 🏋 SFT Related to SFT labels Jan 6, 2025
@kashif
Copy link
Collaborator

kashif commented Jan 9, 2025

thanks @okhat i can have a look and see how to fix it... just debugging currently

@kashif kashif self-assigned this Jan 9, 2025
@okhat
Copy link
Author

okhat commented Jan 11, 2025

Awesome — thanks @kashif ! Looking forward to your findings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ question Seeking clarification or more information 🏋 SFT Related to SFT
Projects
None yet
Development

No branches or pull requests

3 participants