-
I trained with 1000 pieces of conversational data, and the loss was very small, but the effect was poor. I took 10 of data for training, and after about 10 epochs, the effect was very good. What is the reason for this and how to solve it? My config like this CFG = {
'vocab_size': 21128,
'ctx_len': 512,
'embed_dim': 768,
'n_heads': 12,
'n_layers': 12,
'drop_rate': 0.1
} |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @qibin0506 , Let's see what's going on here... first, a few questions:
does this mean you only had 10 training examples? In this case, when you say the loss was very small, do you mean the training or validation loss? One possible explanation is that with only 10 training examples, you model was overfitting to the training data a lot.
it seems you have decreased the vocabulary size. Did you still use the gpt-2 tokenizer? If you change the vocabulary, you'd have to create a new tokenizer and pretrain the model with it before you do finetuning. This could be another explanation that would explain the bad performance. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply @rasbt .
|
Beta Was this translation helpful? Give feedback.
Do you have a portion of the data for validation? How do the training/validation loss curves look like?
I think in this case the dataset of 10k conversational examples may be too small for pretraining. Unless you used a different dataset for pretraining?