CEWithChunkedOutputLoss does not check division by zero #2341

pocca2048 · 2025-02-04T09:51:06Z

Similar to #2225, CEWithChunkedOutputLoss does not check division by zero, too.
This makes a loss nan.

total_elements = (labels != self.ignore_index).sum()
...
return total_loss / total_elements

The text was updated successfully, but these errors were encountered:

joecummings · 2025-02-04T16:15:04Z

This is an interesting point. If you pass a bunch of masked out labels to PyTorch's regular Cross Entropy calculation, you actually get nan, not 0.0.

ignore_index = -100
batch_size = 2
num_tokens = 10
vocab_size = 10
logits = torch.randn(batch_size, num_tokens, vocab_size, dtype=torch.bfloat16)
# Labels are all set to ignore_index
labels = torch.full((batch_size, num_tokens), ignore_index, dtype=torch.long)
logits = logits.reshape(-1, logits.size(-1))
labels = labels.reshape(-1)
standard_loss = torch.nn.functional.cross_entropy(
    logits.float(), labels, reduction="mean", ignore_index=ignore_index
)
print(standard_loss)

I would lean towards keeping our calculation the same as the regular PyTorch core's implementation of Cross Entropy, but would like to hear from @felipemello1.

joecummings · 2025-02-04T22:33:05Z

After discussing offline with @felipemello1, we agree to stick with the current implementation, which matches what you would expect when using torch.nn.CrossEntropy

felipemello1 · 2025-02-04T23:16:57Z

@pocca2048 , out of curiosity, why would your dataset have no labels? If we change the loss, wouldnt it be a silent error?

felipemello1 · 2025-02-04T23:17:14Z

@joecummings , should we change KL loss back, so its consistent?

pocca2048 · 2025-02-05T02:45:35Z

@felipemello1
It happens when we use train_on_input=False and message is too long that output is truncated.

felipemello1 · 2025-02-05T15:28:20Z

what do you think about: #2344?

joecummings added the discussion Start a discussion label Feb 4, 2025

joecummings self-assigned this Feb 4, 2025

joecummings added the triaged This issue has been assigned an owner and appropriate label label Feb 4, 2025

felipemello1 mentioned this issue Feb 5, 2025

Discussion: Update dataloader to skip rows that dont require training #2344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEWithChunkedOutputLoss does not check division by zero #2341

CEWithChunkedOutputLoss does not check division by zero #2341

pocca2048 commented Feb 4, 2025

joecummings commented Feb 4, 2025

joecummings commented Feb 4, 2025

felipemello1 commented Feb 4, 2025

felipemello1 commented Feb 4, 2025

pocca2048 commented Feb 5, 2025

felipemello1 commented Feb 5, 2025

CEWithChunkedOutputLoss does not check division by zero #2341

CEWithChunkedOutputLoss does not check division by zero #2341

Comments

pocca2048 commented Feb 4, 2025

joecummings commented Feb 4, 2025

joecummings commented Feb 4, 2025

felipemello1 commented Feb 4, 2025

felipemello1 commented Feb 4, 2025

pocca2048 commented Feb 5, 2025

felipemello1 commented Feb 5, 2025