-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CEWithChunkedOutputLoss does not check division by zero #2341
Comments
This is an interesting point. If you pass a bunch of masked out labels to PyTorch's regular Cross Entropy calculation, you actually get ignore_index = -100
batch_size = 2
num_tokens = 10
vocab_size = 10
logits = torch.randn(batch_size, num_tokens, vocab_size, dtype=torch.bfloat16)
# Labels are all set to ignore_index
labels = torch.full((batch_size, num_tokens), ignore_index, dtype=torch.long)
logits = logits.reshape(-1, logits.size(-1))
labels = labels.reshape(-1)
standard_loss = torch.nn.functional.cross_entropy(
logits.float(), labels, reduction="mean", ignore_index=ignore_index
)
print(standard_loss) I would lean towards keeping our calculation the same as the regular PyTorch core's implementation of Cross Entropy, but would like to hear from @felipemello1. |
After discussing offline with @felipemello1, we agree to stick with the current implementation, which matches what you would expect when using |
@pocca2048 , out of curiosity, why would your dataset have no labels? If we change the loss, wouldnt it be a silent error? |
@joecummings , should we change KL loss back, so its consistent? |
@felipemello1 |
what do you think about: #2344? |
Similar to #2225,
CEWithChunkedOutputLoss
does not check division by zero, too.This makes a loss nan.
The text was updated successfully, but these errors were encountered: