Finetuning on the last turn of multi-turn conversations #2545

okhat · 2025-01-06T17:00:15Z

Thanks for your great work! I'm updating some old code and would like to use TRL to finetune on the final turn of multi-turn conversations. The simple approaches I tried don't accomplish that.

Let's say I'd like to finetune meta-llama/Llama-3.2-1B-Instruct, though ideally without hardcoding templates for it. I'll keep my report here short since I expect there's a straightforward way I just can't find in the docs.

from dataset import Dataset
from trl import apply_chat_template

train_data = [
   {
      'messages': [
         {'content': 'System!', 'role': 'system'},
         {'content': 'User!', 'role': 'user'},
         {'content': 'Response!', 'role': 'assistant'}
         ]
   }
]

trainset_dict = {
    "prompt": [entry["messages"][:-1] for entry in train_data],
    "completion": [[entry["messages"][-1]] for entry in train_data]
}
trainset = Dataset.from_dict(trainset_dict)
trainset = trainset.map(apply_chat_template, fn_kwargs={"tokenizer": tokenizer})

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(...),
    train_dataset=trainset,
    processing_class=tokenizer,
)

trainer.train()

I've also tried without this trainset.map. When doing that, I attempted to set up a data collator, both DataCollatorForCompletionOnlyLM and DataCollatorForChatML, but faced different issues. The former has trouble locating the instruction_template and response_template and the latter expects examples before tokenization but receives them tokenized, so it crashes on inability to find the key messages in examples.

Appreciate your help!

The text was updated successfully, but these errors were encountered:

kashif · 2025-01-09T13:19:24Z

thanks @okhat i can have a look and see how to fix it... just debugging currently

okhat · 2025-01-11T22:10:41Z

Awesome — thanks @kashif ! Looking forward to your findings!

August-murr added ❓ question Seeking clarification or more information 🏋 SFT Related to SFT labels Jan 6, 2025

kashif self-assigned this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning on the last turn of multi-turn conversations #2545

Finetuning on the last turn of multi-turn conversations #2545

okhat commented Jan 6, 2025

kashif commented Jan 9, 2025

okhat commented Jan 11, 2025

Finetuning on the last turn of multi-turn conversations #2545

Finetuning on the last turn of multi-turn conversations #2545

Comments

okhat commented Jan 6, 2025

kashif commented Jan 9, 2025

okhat commented Jan 11, 2025